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METHODS OF CLONING AND PRODUCING FRAGMENT CHAINS WITH READABLE INFORMATION 
CONTENT 

The present invention relates to new methods of 
attaching first and second nucleic acid molecules, 
particularly methods of cloning in which adapter 
molecules mediate the binding between the first and 
second molecules, the resultant nucleic acid molecules 
thus formed and methods of generating DNA with a readily 
readable information content and kits for performing 
such methods . 

Presently known cloning methods generally involve 
the use of restriction enzymes which are used to 
generate fragments for insertion and cleave vectors to 
produced corresponding and hence complementary terminal 
sequences. Generally, the enzymes which are used cut 
palindromic sequences and thus produce identical 
overhangs. Different sequences that are cut with the 
same restriction endonucleases can then be ligated 
together to form new, recombinant nucleic acids. 

However, such methods suffer from a number of 
limitations. One disadvantage in using endonucleases 
that form two identical overhangs is the formation of 
different products on ligation. If for example two 
fragments A and B are to be ligated, as a consequence of 
common overhangs the products A+A and B+B as well as the 
desired A+B will be produced. Other by-products 
resulting from other fragments produced when A and B 
were formed will also be generated, e.g. reassociation 
into the original positions. It is therefore normal to 
use a separation process using agarose gels. The 
separation procedure however often results in a 
considerable loss of DNA. 

Such methods necessarily suffer from various 
limitations including the by-products mentioned above, 
and the need to identify the desired end-products, e.g. 
if only a particular insert is to be cloned. 

Other cloning techniques have been used in which 
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cloning has been performed using PCR techniques, e.g. in 
which the PCR primers have IIS enzyme recognition sites. 
However, the use of PCR is disadvantageous in cloning 
techniques as it is time consuming and requires 
5 purification steps which result in significant loss of 
yield. The PCR reaction may also introduce point 
mutations and the like and the length of the fragment is 
limited to the polymerase capacity, e.g. a maximum of 
approximately 50kb. 
1° It has now surprisingly been found that by 

generating fragments with unique single stranded regions 
and then mediating the binding between a first and 
second nucleic acid molecule, many of these 
disadvantages may be avoided. In this method, 
15 restriction nucleases are used that form non- identical 
overhangs, e.g. type IP or IIS restriction 
endonucleases . As will be appreciated, if one uses a 
restriction endonuclease that makes overhangs of 4 base 
pairs, each fragment that is formed will have two 
20 overhangs of 4 base pairs each. It is theoretically 

possible therefore that 4 8 (ie. 65,536) fragments may be 
formed with different combinations of the two overhangs. 
Thus, as a rule, each fragment formed on cleavage will 
have a unique pair of overhangs even when cleaving large 
25 nucleic acid molecules. 

These unique overhangs may then be addressed and 
adjusted appropriately using adapters with two 
overhangs . For example in a cloning technique one of 
the overhangs is made to correspond to the overhang on 
3 0 the insert and the other overhang is made to correspond 
to the overhang on the vector into which the insert is 
to be introduced. This method is outlined in Figure 1. 
In that case the DNA molecule containing the insert is 
cut with a restriction endonuclease which makes an 
35 overhang on each side of the insert. Each of the many 

fragments which are formed have different overhangs such 
that the two overhangs at either end of the insert are 
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unique. Ligase is then added to bind two adapters with 
corresponding single stranded regions. This leads to 
the formation of two new overhangs at the termini of the 
insert, which are selected such that they can be used to 
5 bind to the vector into which the insert, is to be 

cloned. Providing identical overhangs are not created 
on other molecules only the desired insert will be 
ligated to the adapters. In the final step the insert 
is ligated into the vector which has two overhangs which 

10 complement the adapters 1 overhangs. The overhangs in 

the vector may be constructed using the same principles 
as described for the insert. 

Thus in this new method, an adapter molecule is 
used which is complementary to a single stranded region 

15 generated on the first nucleic acid molecule and 

therefore binds to that molecule, but has a different 
single stranded region at its other terminus, thus 
effectively modifying the single stranded region 
presented for binding by the first nucleic acid molecule 

20 fragment. The adapters free^single stranded region may 
then. mediate the binding of the first nucleic acid 
molecule fragment to a second nucleic acid molecule 
exhibiting a complementary single stranded region. 
This method of mediation has particular 

25 applications for effectively identifying and selecting a 
first nucleic acid molecule fragment and then mediating 
its binding to a second nucleic acid molecule where this 
was not previously possible. 

Of particular relevance to methods of cloning is 

30 the generation of fragments for cloning which have 
different single stranded regions at their termini 
relative to other fragments, which may then be selected 
and cloned into an appropriate vector. As described 
herein, such fragments are generated by the use of 

35 enzymes which cleave outside their recognition site and 
thus produce overhangs that depend on the sequence 
surrounding the recognition site which is likely to. vary 
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from fragment to fragment . 

Such techniques may be used to direct only a single 
fragment to a particular vector or may be used to direct 
different fragments to different sites or indeed 
different vectors, even within the same reaction mix, 
providing appropriate adapters are constructed. 

These methods have particular advantages over prior 
art methods. In particular, the whole procedure may be 
carried out in one or two steps, e.g. cutting and 
ligating simultaneously or cutting and ligating 
separately. Even in instances where the procedure is 
performed in two steps, it will often be possible to 
perform both steps in the same buffer, e.g. since T4 DNA 
ligase is known to work well in most buffers for 
restriction endonucleases . Time- and resource -consuming 
precipitation procedures may therefore be avoided. 
Moreover, ligations can be performed with overhangs of 
4-6 bases, unlike conventional cloning where overhangs 
of 0-4 bases are used, thereby increasing ligation 
efficiency considerably . 

Furthermore, the need to carry out gel separations 
may be avoided. The quantity of DNA required initially 
can be reduced substantially. Mutation of DNA molecules 
on UV exposure, a common occurrence in gel separation, 
may also be avoided. Furthermore, laboratory staff are 
not exposed to carcinogenic EtBr. Also, separation 
problems which can occur when restriction cleavage 
results in fragments of similar size may be avoided. 
The frequency of undesirable side -products such as empty 
vectors, too many inserts or incorrect orientation of 
the inserts may also be avoided. 

Since it is generally not problematic if the insert 
is cleaved, a small selection, e.g of type IIS or Ip 
restriction endonucleases could provide far more cloning 
possibilities than a corresponding selection of ordinary 
type II restriction endonuclease used for conventional 
cloning procedures. Having a few type IIS, IP and 
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similar restriction endonucleases that cleave with high 
frequency allows for many cloning possibilities. 

In the specific instance of cloning of large DNA 
molecules (e.g. genomic DNA) or a solution containing 
5 many different DNA molecules in parallel (e.g. a cDNA 
library) it is very difficult to use conventional 
methods. If for example a large DNA molecule is cleaved 
with EcoRI, a large number of fragments may be formed 
with the same overhang, and in addition a considerable 

10 proportion of these fragments may be of roughly the same 
size. This may lead to the formation of a large number 
of undesired ligation products, even with gel 
separation. Moreover, gel separation can be difficult 
if the insert is large. Furthermore, it is also often 

15 difficult, or even impossible, to find restriction 

endonucleases that will not cut large inserts. These 
problems may be reduced/eliminated using the cloning 
procedure described herein. 

If necessary, it is possible to increase the number 

20 of base pairs in the overhangs to. (e.g.) 6 by using Cjel 
or similar endonucleases to form an even greater number 
of possible variables and thus increase the probability 
of producing unique overhangs . 

The advantages of the method of the invention are 

25 even greater in complex cloning procedures. If several 
adapters are used for example, it is possible to clone 
many different inserts into one and the same vector at a 
corresponding number of different sites in one and the 
same reaction, as described hereinafter in more detail. 

3 0 Deletions of small or large fragments may also be 

achieved using the same basic principle. This opens up 
the possibility of making complex recombinations of 
inter alia genomic DNA (removal of endogen viruses in 
genomes to be used for xenotransplantation, the 

35 insertion of a large number of genes from other genomes, 
new combinations of genes etc.) . The method can also be 
used for exon-shuf fling and other recombinations that 
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are relevant in connection with artificial evolutionary 
systems . 

Thus, in a first aspect, the present invention 
provides a method of attaching a fragment of a first 
nucleic acid molecule to a second nucleic acid molecule, 
wherein said method comprises at least the steps: 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
recognition site to create at least one fragment of said 
first nucleic acid molecule having a single stranded 
nucleotide region (SSla) at at least one terminus of 
said fragment, 

2) if necessary generating a single stranded 
nucleotide region (SS2) at at least one terminus of said 
second nucleic acid molecule, 

3) binding to at least one single stranded region of 
step 1) (SSla) an adapter molecule comprising at one 
terminus a single stranded region (SSA1) complementary 
to the single stranded region of said first nucleic acid 
molecule fragment (SSla) and additionally comprising at 
the other terminus a further single stranded region 
(SSA2) complementary to the single stranded region (SS2) 
at one terminus of said second nucleic acid molecule, 

4) ligating said adapter to said first nucleic acid 
fragment , 

5) binding said adapter to said second nucleic acid 
molecule/ and 

6) ligating said adapter to said second nucleic acid 
molecule . 

As used herein, said first and second nucleic acid 
molecules are any naturally occurring or synthetic 
polynucleotide molecules, e.g. DNA, such as genomic or 
cDNA, PNA and their analogs, which are double stranded 
and in which single stranded regions may be generated. 

Fragments of the first nucleic acid molecule are 
generated by use of a nuclease which cleaves outside its 
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recognition site. One or. more fragments may be 
generated depending on the sites which are cleaved (e.g. 
if the site is at the extreme end of the molecule only a 
few bases may be removed rather than the production of. 2 
5 fragments) . Other nucleic acid molecule fragments 

described herein may be generated by any appropriate 
means, as mentioned herein, including the techniques 
used to produce the first nucleic acid molecule 
fragments. Fragments are preferably more than 10 bases, 

10 e.g. 10 to 200bp, preferably more than 100 bases in 
length. For cloning applications, fragments having 
lengths in excess of 200 bases, e.g. from 200 bases to 
2kb may be used. Where longer single stranded regions 
are generated, fragments of longer lengths are also 

15 contemplated, e.g. 10-100kb or longer. 

"Single stranded regions" as referred to herein are 
regions of overhang at the end, ie. at the terminus of 
the first, second or third nucleic acid molecules or 
adapter molecules. These regions are sufficient to 

20 allow specific binding of molecules having complementary 
single stranded regions and subsequent ligation between 
these molecules. Thus, the single stranded regions are 
at least 1 base in length, preferably 3 bases in length, 
but preferably at least 4 bases, e.g. from 4 to 10 

25 bases, e.g. 4, 5 or 6 bases in length. Single stranded 
regions up to 2 0 bases in length are contemplated which 
will allow the use of fragments in the method of the 
invention which are up to Mb in length. 

"Binding" as used herein refers to the step of 

3 0 association of complementary single stranded regions 
(ie. non-covalent binding). Subsequent "ligation" of 
the sequences achieves covalent binding. 

"Complementary" as used herein refers to specific 
base recognition via for example base-base 

35 complementarity. However, complementarity as referred 
to herein includes pairing of nucleotides in Watson- 
Crick base-pairing in addition to pairing of nucleoside 
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analogs, e.g. deoxyinosine which are capable of specific 
hybridization to the base in the nucleic acid molecules 
and other analogs which result in such specific 
hybridization, e.g. PNA, DNA and their analogs. 
5 Complementarity of one single stranded region to another 
is considered to be sufficient when, under the 
conditions used, specific binding is achieved. Thus in 
the case of long single stranded regions some lack of 
base-base specificity, e.g. mis-match, may be tolerated, 
10 e.g. if one base in a series of 10 bases is not 

complementary. Such slight mismatches which do not 
affect the ultimate binding and ligation of the single 
stranded regions are considered to be complementary for 
the purposes of this invention. The single stranded 
15 regions may retain portions, on binding, which remain 

single stranded, e.g. when overhangs of different sizes 
are employed or the complementary portions do not 
comprise all of the single stranded regions. In such 
cases, as mentioned above, providing binding can be 
20 achieved the single stranded regions are considered to 
be complementary. In those cases, prior to ligation, 
missing bases may be filled in e.g. using Klenow 
fragment, or other appropriate techniques as necessary. 
"Adapters" as referred to herein are molecules 
25 which adapt the first nucleic acid molecule fragment for 
binding to a second or third nucleic acid molecule. 
Adapter molecules comprise at least two regions . A 
first portion containing a single stranded region which 
is complementary to the single stranded region on the 
3 0 first nucleic acid molecule fragment and a second 

portion containing a single stranded region which is 
complementary to the single stranded region on the 
second nucleic acid molecule. The single stranded 
regions are as described hereinbefore and are preferably 
35 on different strands making up the adapter molecule. 

The above mentioned portions are at least as large as 
the single stranded regions, e.g. 4 to 6 bases in 
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length, although they may be longer, e.g. up to 20 bases 
in length. 

A linking region between these single stranded 
regions is required for the stability of the molecule. 
5 Conveniently this comprises a double stranded nucleic 
acid fragment, especially in methods of cloning where 
amplification, replication and/or translation are to be 
performed. However, this portion may be substituted by 
any appropriate molecule depending on the end use of the 

10 resulting ligated molecule. Clearly, to achieve 

ligation between the first and second nucleic acid 
molecules appropriate attachment points and moieties for 
ligation must be provided. 

The linking portion may serve more than just a 

15 linking function and may for example provide sequences 
appropriate for primer or probe binding, e.g. for 
amplification or identification, respectively, or may 
contain integration sites for mobile elements such as 
transposons and the like. Depending on how the method 

20 is performed, the adapters preferably do not contain 
restriction sites for any restriction enzymes used in, 
the method of the invention thus avoiding the need to 
inactivate or remove the enzymes prior to the addition 
of the adapters . 

25 Conveniently adapter molecules may be exclusively 

comprised of a nucleic acid molecule in which the 
various properties of the adapter are provided by the 
different regions of the adapter. 

Conveniently adapters are made up of two 

3 0 complementary oligonucleotides having between 10 and 100 
bases each, e.g. between 20 and 50 bases. 

In the method described above, preferably at least 
one first nucleic molecule fragment is generated having 
a single stranded region at either end (SSla and SSlb) 

35 to each of which an adapter binds. 

Preferably the method described herein is used for 
cloning. Thus, in the method described above, an 
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adapter is bound at either end of the first nucleic acid 
molecule fragment (in which the adapters may be the same 
of different) , and the unbound end of the first adapter 
is bound to the second nucleic acid molecule and the 
5 unbound end of the second adapter binds either to the 
second nucleic acid molecule (ie. at the other end 
distal to the binding of the first adapter, thereby 
forming a circular molecule) or binds to a third nucleic 
acid molecule. The first of these two alternatives may 
10 arise through cleavage of a circular vector to give rise 
to the second nucleic acid molecule to which the 
[adapter 1] : [first nucleic acid molecule 

fragment] : [adapter 2] insert is bound to re-circularize 
the vector. Alternatively, a linear or circular vector 

15 may be cleaved giving rise to two or more discrete 
fragments (herein the second and third nucleic acid 
molecules) which may be joined by the adapter l:first 
nucleic acid molecule : adapter 2. 

Thus, in a preferred feature, a first nucleic acid 

20 molecule fragment is generated which has a single 

stranded nucleotide region at either terminus (SSla and 
SSlb) , each of which is bound by aa adapter, which may 
be the same or different, and the first of said adapters 
is bound to said second nucleic acid molecule and the 

25 second of said adapters binds either to said second 
nucleic acid molecule or to a third nucleic acid 
molecule . 

Thus, alternatively stated, in a preferred 
embodiment, the present invention provides a method of 
30 cloning a fragment of a first nucleic acid molecule into 
a second nucleic acid molecule, wherein said method 
comprises at least the steps : 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
3 5 recognition site to create one or more fragments of said 
first nucleic acid molecule, wherein at least one 
fragment has a single stranded nucleotide region at both 
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termini (SSla and SSlb) ", 

2) cleaving said second nucleic acid molecule to 
create at least two single stranded regions (SS2a and 
SS2b) at the site of said cleavage (e.g. linearizing a 

5 circular vector or producing fragments in a linear or 
circular vector) , 

3) binding to one of the single stranded regions of 
step 1) (SSla) 

a first adapter molecule comprising at one terminus 
10 a single stranded region (SSA1) complementary to 

the single stranded region of said first nucleic 
acid molecule fragment (SSla) and additionally 
comprising at the other terminus a further single 
stranded region (SSA2) complementary to one of the 
15 single stranded regions (SS2a) produced by cleavage 

of said second nucleic acid molecule, and 
binding to a second single stranded region of step 1) 
(SSlb) 

■ • - a^ second adapter molecule as defined above which 
20 binds to the second single- stranded region of said ' 

first nucleic acid molecule fragment (SSlb) and to 
the second single stranded region (SS2b) produced 
by cleavage of said second nucleic acid molecule, 

4) ligating said adapters to said first nucleic acid 
25 fragment, 

5) binding said, adapters to said second nucleic acid 
molecule or fragments thereof, and 

6) ligating said adapters to said second nucleic acid 
molecule or fragments thereof. 

3 0 In instances in which cleavage of the second 

nucleic acid molecule results in the production of two 
or more discrete fragments which become ligated to the 
first nucleic acid molecule fragment via the adapters, 
said fragments constitute second and third nucleic acid 

35 molecules of the invention. 

Preferably, to prevent concatermirisat ion of 
[adapter : first nucleic acid fragment : adapter] units, the 
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single stranded region of the second and third nucleic 
acid molecules which bind to these adapters are not 
complementary. Thus, for example, where cloning into a 
vector is performed, preferably said vector is 
5 linearized and at least of portion of said vector is 

removed from one terminus of that vector, e.g. at least 
two cleavage events occur. 

In such methods, particularly for cloning, the 
second nucleic acid molecule, e.g. into which a first 

10 nucleic acid molecule fragment is inserted is 

conveniently a vector (or a part thereof, e.g. where the 
second and third nucleic acid molecules together 
comprise the vector, and result through its cleavage) . 
Such vectors include any double stranded nucleic acid 

15 molecule which may be linear or circular. (However, as 
mentioned above in respect of the adapters, providing 
single stranded regions exist, or are generated at the 
termini of the second nucleic acid or its fragments 
(e.g. the vector), the adjacent regions may be made up 

20 of any molecule providing ligation at the termini to the 
adapters is not compromised.) 

Conveniently such vectors may contain sequences 
which aid their use in methods of the invention or their 
subsequent manipulation. Thus, vectors are conveniently 

25 selected with only two or a small number of restriction 
cleavage sites for the method of cleavage used. Thus 
for example where restriction enzymes are used, the 
vector is selected to include only a minimal number, 
preferably only two recognition sites to that enzyme. 

30 Vectors may additionally comprise further portions 

or sequences for cloning, selection, amplification, 
transcription or translation as appropriate. Thus 
vectors may be used with probe or primer sites, promoter 
regions, other regulatory regions, e.g. expression 

35 control sequences etc. Conveniently well-known cloning 
vectors are employed, such as pBR3 2 2 and derived 
vectors, pUC vectors such as pUC19, lambda vectors, BAC, 
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YAC and MAC vectors and other appropriate plasmids or 
viral vectors . 

The molecule of which a fragment is to be inserted, 
ie. the first nucleic acid molecule, may be any molecule 
5 which can generate single stranded regions at at least 
one of its ends using the nucleases described herein, 
although the central portion may be varied as 
appropriate. Preferably however such molecules are 
double stranded nucleic acid molecules and contain 
10 appropriate sites for the use of enzymes to create the 
single stranded overhangs which are required in 
accordance with the invention. Appropriately, the first 
nucleic acid molecule is derived from genomic DNA and 
the method of the invention is used to insert fragments 
15 thereof into appropriate vectors . 

Adapters which may be used include short double 
stranded nucleic acid molecules with single stranded 
regions at their termini to longer molecules which may 
contain further sequences for example to allow selection 
20 as described hereinafter. Appropriate single stranded 
regions are selected on the basis of the terminal 
sequence of the first, second and third nucleic acid 
molecules or fragments thereof. Appropriate selection 
may also be used to direct the orientation of the 
25 insert, e.g. to produce clones which may be used to 
produce antisense nucleic acid molecules . 

Adapters may be used in the methods of the 
invention in which their single stranded overhangs have 
already been generated, e.g. by the combination of 
30 single stranded complementary oligonucleotides which on 
hybridization leave overhangs at either ends, or by 
appropriate cleavage or digestion. 

Alternatively, during the method of the invention, 
adapters may be modified to provide single stranded 
35 portions, e.g. by the use of restriction enzymes or 

other appropriate techniques during the course of the 
reaction. Conveniently, to simplify the number of 
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steps, the enzymes used to generate single stranded 
regions in the first, second or third nucleic acid 
molecules (where necessary) may be used to generate the 
adapter single stranded regions. 

As mentioned previously, the single stranded region 
may be 4 or more bases in length. When using longer 
overhangs or where the sequence of the full 
corresponding single stranded region of the first, 
second or third nucleic acid molecules is not known or 
unclear, a family of adapters with one or more 
degenerate bases in the single stranded region may be 
used, for example using methods to create libraries of 
adapters. Degenerate bases may also be used at 
positions prone to mis-match ligations. 

For convenience a universal library of adapters may 
be created for use in the method of the invention. Thus 
for example, 16 different adapters with a 4 base-pair 
overhang consisting of two random bases (NN) and two 
bases specific to each adapter (e.g. AA, CC, . . .TT) may 
be created. In this way sufficient adapters may be 
created which are capable of distinguishing between 16 
different first molecule fragment overhangs, which would 
suffice for many cloning purposes. Similarly a library 
of second molecule, e.g. vector overhangs may be 
created. 

To increase the number of permutations in an 
adapter library, two separate oligonucleotide libraries 
may be generated, one with single stranded 
oligonucleotides with regions that will correspond to 
the single stranded region of the first nucleic acid 
molecule fragment and the second library with single 
stranded oligonucleotides with regions that will 
correspond to the single stranded region of the second 
nucleic acid molecule (e.g. vector) . However in common 
in each member of the library is a complementary region, 
such that when one member from the first library is 
selected and combined with a member of the second 
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library, they will hybridize leaving free the relevant 
single stranded regions. Thus for example to generate 
an adapter with an AA overhang and a TC overhang to bind 
to the first and second nucleic acid molecules 
5 respectively, members of the different libraries such as 
GGGCCCCNNAA may be combined with TCNNNCCGGGG to form: 
GGCCCCCNNAA, 
TCNNNCCGGGG 

which exhibits the appropriate overhangs. When using 

10 only two 16 member libraries this allows the production 
of 256 different adapters . 

In generating appropriate adapters conveniently the 
amount of mis -match which needs to be tolerated when 
binding to overhangs on first, second and/or third 

15 nucleic acid molecules should be reduced. This may 

conveniently be achieved by selecting oligonucleotides 
on the basis of the probability of a mismatch ligation 
being generated. A computer program for achieving this ~ 
is described in more detail in Example 6. This method 

20 allows sets of oligonucleotides to be identified which 
can be used to construct chains with more than 100 
fragments in a single ligation cycle but with very low 
levels of mis -match. Thus in a further feature the 
present invention provides computer software adapted to 

25 identify adapter molecules for use in the method of the 
invention. 

As mentioned above, the production of fragments of 
said first nucleic acid molecule is achieved using a 
nuclease which has a cleavage site separate from its 

30 recognition site. In so doing, unique overhangs are 

created which reflect the sequence of that molecule. In 
a preferred feature, said nuclease is a class IP or IIS 
restriction enzyme or functional derivatives thereof. 
Such enzymes include enzymes produced synthetically 

35 through the fusion of appropriate domains to arrive at 
enzymes which cleave at a site distal to their 
recognition site. 
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These enzymes exhibit no specificity to the 
sequence that is cut and they can therefore generate 
overhangs with all types of base compositions. Cleavage 
with IIS enzymes result in overhangs of various lengths, 
5 e.g. from -5 to +6 bases in length. Preferably for 
performing the method of the invention, enzymes are 
chosen which generate 3-6, e.g. 4 base pair overhangs. 
Preferred enzymes for use in the invention include 
enzymes which produce 4 base overhangs at the 3 ! end: 

10 BstXI; 5 base overhangs at the 3' end: Alol , Bael , BplI , 
Bsp24I; 6 base overhangs at the 3 'end: Cjel, CjePI, 
HaelV; 4 base overhangs at the 5 f end: Acelll, Acc36I, 
A1W26I, AlwXI, Bbrll, Bbsl , BbvZ , BJbvII, BvJbl6II, 
B11736I, Bpil, BpuAI, Bsal , Bsc91I, BseKI, BseXI , BsmAl , 

15 BsmBI, BsmFI, Bso31I, Bsp4 23I, BspBS31I, BspIS4I, 

BspLUllIII, BspMI, BspSTBI, BspTS514I, Bstl2I, Bst71I, 
BstBS32I , BstGZ53I , BstTS5I , BstOZ616I , Bs£PZ418I , 
Bco31I, BcoA41, Bco044I, Esp3I, FokX , Phal , SfaNI, 
Sthl32I, StsI; and 5 base overhangs at the S ! end: Hgral 

20 Over 100 classes of IIS restriction endonucleases 

have been identified and there are large variations both 
with respect to substrate specificity and cleaving 
pattern. In addition, these enzymes have proved to be 
well suited to "module swapping" experiments so that one 

25 can create new enzymes for particular requirements 

(Huang-B, et al . ; J-Protein-Chem. 1996, 15(S):481-9, 
Bickle, T.A. ; 1993 in Nucleases (2nd edn) , Kim-YG et 
al.;PNAS 1994, 91:883-887). In these experiments the 
binding domain of transcription factor Spl was merged 

3 0 with the cleavage domain of FoJcI to construct a class 
IIS restriction endonuclease that makes a 4 -base 
overhang with 5pl sites. In other experiments a class 
IIS restriction endonuclease that cuts outside the 
binding sites of transcription factor Ultrabithorax was 

3 5 generated. Corresponding experiments have been 

conducted on class I enzymes. By merging the N- terminal 
part of the hsdS sub-unit of StyR 1241 (which recognizes 



BNSDOCID: <WO 0100816A1_I_> 



WO 01/00816 



- 17 - 



PCT/GB00/02512 



GAAN 6 RTCG) with the C- terminal part of the hsdS sub-unit 
of StyR 1241 (which recognizes TCAN7RTTC) a new enzyme 
that recognizes the sequence GAAN 6 RTTC was constructed. 
Several other experiments have been carried out with . 
5 similar success. Unlike in the case of ordinary class 
II enzymes, it is therefore reasonable to assume that a 
number of new IIS and IP restriction enzymes can be 
constructed and adapted to cloning requirements that may 
arise in the future. Very many combinations and 

10 variants of these enzymes can therefore be used 
according to the principles described herein. 

Generation of the single stranded regions on said 
first nucleic acid fragment may be achieved directly by 
cleavage of said first nucleic acid molecule with 

15 nucleases described herein without the development of 

intermediate molecules. This forms a preferred feature 
of the invention. Alternatively, indirect and more 
elaborate techniques may be used. For example, the 
first nucleic acid molecule or a fragment thereof may be 

2 0 "trimmed" using the nucleases described herein, in which 
linker molecules which carry the nuclease recognition 
site are bound to the first nucleic acid molecule or 
fragment thereof, and cleavage outside the recognition 
site results in cleavage within the first nucleic acid 

25 molecule or fragment thereof. This method is 

particularly useful since it takes advantage of the fact 
that T4 DNA ligase (and also other ligases) works well 
in most buffers used for restriction cutting. Ligation 
and cleavage can therefore be performed simultaneously 

30 in the same solution. Furthermore, this methods allows 
the generation of a unique overhang when the overhang 
generated by the .first cleavage step is not unique. 

The trimming procedure may be initiated using an 
"initiation linker", that is addressed to an overhang on 

35 the first nucleic acid molecule or fragment thereof, 
e.g. after cleavage with one or more restriction 
endonucleases as described herein. As used herein, a 
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"linker" refers to a molecule which is similar to an 
"adapter" as described herein, except that the linker 
need only contain one single stranded region to allow 
binding to the molecule to be trimmed. Furthermore, the 
initiation linker contains one or more cleavage sites 
for nucleases that cleave outside their own recognition 
sequence, as described herein, for example Bpll . The 
first nucleic acid molecule or fragment thereof should 
preferentially not contain cleavage sites for the IIS 
enzymes (s) used for the trimming procedure. Such 
cleavage sites may alternatively be inactivated prior to 
the trimming procedure (e.g. by methylation) . 

Propagation linkers (if used) and a termination 
linker (wherein the latter may be an adapter as 
described herein), T4 DNA ligase and the IIS enzyme (s) 
used for the trimming may be added together with the 
initiation linker. Once the initiation linker has been 
ligated into position, cleavage may be effected 
resulting in the generation of an overhang within the 
first nucleic acid molecule or fragment thereof. If 
desired (ie. if further trimming is required) , a 
propagation linker containing degenerate overhangs may 
be used to ligate with the overhang which has been 
generated. Since the linker will also carry an 
appropriate nuclease recognition site, cleavage will 
again produce a further cleavage site further upstream 
into the first nucleic acid molecule or fragment 
thereof. This process will continue until an overhang 
is generated that is complementary to one of the 
overhangs in the termination linker (or adapter as 
described herein) . This final linker will not itself 
have the nuclease recognition site and will therefore 
terminate trimming. As mentioned previously, this 
terminator linker may have an appropriate single 
stranded region for binding to the adapter used in the 
next step, or may itself be the adapter. An appropriate 
technique for performing the trimming method may be 
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found in Examples 4 and 9 . 

The trimming method is preferably not performed 
with IIS enzymes belonging to the Bcgl class (e.g. BplI, 
Bael etc.) as the proteins are combined methylases and 
5 endonucleases and the methylase function may inactivate 
the binding sites on propagation linkers. Enzymes 
including Fokl, Hgal etc. are therefore preferred 
enzymes for performing this method. If Bcgl class 
enzymes are to be used, the cof actor AdoMet should be 

10 replaced with AdoHcy, Sinefungine or other cofactors 
that can not function as methyl donors. 

Thus in a preferred feature the invention provides 
a method of removing the end terminus of a double 
stranded nucleic acid molecule with at least one single 

15 stranded region, comprising at least the steps of (i) 

binding (ie. ligated) a double stranded linker molecule 
containing a recognition site for a nuclease which 
cleaves outside its recognition site and a single 
stranded region complementary to the single stranded 

20 region on said double stranded nucleic acid molecule to • 
said molecule and cleaving using said nuclease, thereby 
resulting in removal of one or more bases (e.g. 3-10, 
which may be in single or double stranded form, or a 
combination thereof) from the terminus of said nucleic 

25 acid molecule, (ii) optionally binding one or more 

propagation linkers which contain a recognition for a 
nuclease as described above and a degenerate single 
stranded region which binds to the overhang generated by 
the first or subsequent cleavage steps and cleaving 

30 using said nuclease, and (iii) adding a termination 
linker which binds to the single stranded region 
generated in steps i or ii. 

A similar technique may be used to remove unwanted 
sequences, e.g. contributed by the adapter after 

35 ligation of the first nucleic acid molecule fragment and 
second (or third) nucleic acid molecules. Various 
techniques may. be used to remove the unwanted sequences, 
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e.g. if the sequence (e.g. a region from the adapter) 
contains a plant transposon sequence, this may be 
removed by adding necessary transposase enzymes to 
excise that sequence. Alternatively, the unwanted 
sequence may be removed by taking advantage of nuclease 
that cleave outside their recognition site. Thus, for 
example, adapters may be used which contain recognition 
sites for such enzymes which on cleavage (by appropriate 
selection of cleavage site sequences) , result in 
overhangs generated at two distinct cleavage sites which 
are complementary and thus allow concomitant excision of 
the intervening sequence . Examples of techniques for 
removing intervening sequences are shown in Example 
5. It will be appreciated that depending on the 
nuclease employed, it may be necessary to inactivate 
sites for that enzyme at locations other than adjacent 
to or within the intervening sequence. 

Thus, in a further preferred feature, adapters as 
used herein, additionally comprise one or more nuclease 
recognition and cleavage sites whereby arrangement of 
said sequences allows, on cleavage, generation of 
complementary single stranded regions wherein each one 
of said pair of single stranded regions is generated by 
cleavage at a distinct site. 

Depending on how the different steps in the method 
of the invention are performed, as described 
hereinafter, where necessary the second nucleic acid 
molecule, and/or the adapters may also be cleaved or 
digested to provide appropriate single stranded regions . 
In a preferred feature, the second nucleic acid molecule 
and/or the adapters are cleaved using the nucleases 
described above for generating the first nucleic acid 
molecule fragments. However, instead of cleavage with 
such nucleases, to generate appropriate single stranded 
regions and/or fragments from the second or third 
nucleic acid molecules or adapters, alternative 
techniques may be used. Thus for example other 
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restriction enzymes, non-specific nucleases or 
appropriate exonucleases or mechanical methods such as 
sonication or vortexing may be used. Where enzymes are 
employed, small volumes are preferably used during the 
5 reactions to increase efficiency. 

Ligation between the adapters and first, second and 
third nucleic acid molecules is achieved by any 
appropriate technique known in the art (see for example, 
Sambrook et al . , in "Molecular Cloning: A Laboratory 

10 Manual", 2nd Ed., Editor Chris Nolan, Cold Spring Harbor 
Laboratory Press, 1989) . For example, ligation may be 
achieved chemically or by use of appropriate naturally 
occurring ligases or variants thereof. Appropriate 
ligases which may be used include T4 DNA ligase, and 

15 thermostable ligases, such as Pfu, Taq, and TTH DNA 
ligase. Ligation may be prevented or allowed by 
controlling the phosphorylation state of the terminal 
bases e.g. by appropriate use of kinases or 
phosphatases . Appropriately large volumes may also be 

20 used to avoid intermolecular ligations. Thus, high 

adapter to vector/insert ratios may be used to avoid the 
vector or insert religating into its source material. 

Other techniques may be used to avoid or remove 
vectors which become religated or which do not cleave. 

25 For example the insert may be cloned into a selection 
marker that destroys the host bacteria unless it has 
been inactivated by the insert. Alternatively 
restriction cleaving using restriction enzymes specific 
for the fragment removed from the vector may be 

30 performed after the ligation step. Religated and 

uncleaved vectors would be cleaved in this step. Thus, 
the ideal cloning site is therefore one which contains 
many unique restriction sites that are removed upon 
insert ligation. Alternatively well-known techniques 

3 5 may be used for identifying the desired product, e.g. 
gel separation. 

If the steps of cleavage and ligation are performed 
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together, advantageously the insert and the vector into 
which it is inserted do not contain binding sites for 
the nuclease used. Similarly, it is advantageous if the 
fragment removed from the vector during the process of 
5 cloning contains binding sites for the nuclease. In 

that case, if that fragment religates with the vector it 
would be cleaved and thereby removed again. 

Once the first and second nucleic acid molecules 
(and optionally third nucleic acid molecules) or 

10 fragments thereof have been covalently attached, where 
necessary selection of appropriate products from any 
side-products may be performed. Selection may be 
performed by any techniques known in the art . 
Conveniently however, labelled probes may be used to 

15 identify sequences present only in the correct product, 
e.g. by probing for one or more sequences formed only 
through the union of the correct sequences, e.g. a probe 
directed to the junction between the adapter and the 
first, second or third nucleic acid sequences. 

2 0 Alternatively, the correct ligation may be detected by 

functional properties bestowed on the product through 
ligation, e.g. through the completion of sequences which 
allow expression of a particular product once the vector 
has been cloned into an appropriate host . 

25 Alternatively, selection may be performed by sequencing 
of the products which have been obtained, e.g. after 
amplification and/or transformation. 

Appropriate labels include any moieties which 
directly or indirectly allow detection and/or 

30 determination through the generation of a signal. 

Although many appropriate examples exist, examples 
include for example radiolabels, chemical labels (e.g. 
EtBr, TOTO, YOYO and other dyes) , chromophores or 
fluorophores (e.g. dyes such as fluorescein and 

3 5 rhodamine) , or reagents of high electron density such as 

ferritin, haemocyanin or colloidal gold. Alternatively, 
the label may be an enzyme, for example peroxidase or 
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alkaline phosphatase, wherein the presence of the enzyme 
is visualized by its interaction with a suitable entity, 
for example a substrate. 

As mentioned previously, one of the significant 
5 advantages which this method offers over known methods 
is the simplification of the techniques which are 
required. The steps described herein may be performed 
sequentially in separate tubes (e.g. when different 
enzymes are used and cross-reaction is undesirable) or 

10 in a limited number of steps. However, ideally, the 
reaction is performed in a single step. This can be 
achieved by appropriate selection of enzymes, adapters 
and second/third nucleic acid molecules, e.g. vectors. 
Thus for example the first nucleic acid molecule 

15 may be fragmented using a particular nuclease which is 
also used to fragment the second nucleic acid molecule . 
Since the enzyme used will cleave outside its 
recognition site, it would be expected that the 
resulting single stranded regions found on both the 

20 first and second nucleic acid molecule fragments will be 
unrelated. However, by appropriate choice of the 
mediating adapters (which may also be added providing 
they do not have restriction sites for that enzyme, or 
that cleavage at those sites reveals appropriate single . 

25 stranded regions) , these unrelated sequences may be 

linked via the intermediacy of the adapters. Thus the 
entire reaction may be performed in a single step. 

It will also be appreciated that the adapters may 
be used to address the first nucleic acid fragments to 

30 different second nucleic acid fragments or cleavage 
sites. This would therefore allow different first 
nucleic acid molecule fragments to be directed and 
ligated to a particular vector or site within a vector. 
Thus multiple vectors (and corresponding appropriate 

3 5 adapters) may be used simultaneously and take up a 
single first nucleic acid molecule fragment. 

Alternatively, multiple fragments or copies of the 
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same fragment could be inserted at different sites 
within the same vector (in the latter case by the use of 
adapters with one common end but with the other end 
exhibiting variability to allow it to bind to different 
5 sites within the vector). In a further alternative, the 
first nucleic acid molecule fragments could be captured 
in the reverse orientation (again by appropriate adapter 
choice) and inserted into a vector, e.g. to produce 
antisense strands . 

10 Thus in a preferred embodiment the method described 

herein is performed in a single step. The ligation 
steps (ie. adapter to first nucleic acid molecule 
fragment and final ligation) may however be conducted 
separately once association of the relevant molecules 

15 has been achieved. In a further preferred embodiment, 
the invention provides a method of simultaneously 
attaching two or more fragments of the first nucleic 
acid molecule to different second nucleic acid molecules 
(or different termini thereof) . In cloning, this 

2 0 equates to the introducing of the two or more fragments 

into different sites in said second nucleic acid 
molecules or into different second nucleic acid 
molecules, e.g. into different sites within a vector or 
into different vectors. 
25 Thus the present invention provides methods of the 

invention in which two or more fragments of the first 
nucleic acid molecule are attached to different second 
and optionally third nucleic acid molecules, or 
different termini thereof. In a preferred feature, 

3 0 methods are provided wherein one or more fragments of 

said first nucleic acid molecule are attached via 
adapters to single stranded regions in said second 
nucleic acid molecule resulting from different cleavage 
events. As a further preferred feature, methods are 
35 provided wherein one or more fragments of said first 
nucleic acid molecule are attached via adapters to 
single stranded regions in two or more second nucleic 
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acid molecules. 

It will be appreciated that even more complex 
reactions may be envisaged in which multiple first 
nucleic acid molecules (e.g. 2 or more, e.g. 2-10) are 
5 simultaneously cleaved in the same reaction and their 
fragments bound to appropriate adapters which direct 
them to bind to different second nucleic acid molecules, 
e.g. different vectors or sites in vectors. 

Whilst the above described methods describe an 

10 especially simplified method, the above described 

effects may also be achieved by performing the method in 
discrete steps. This is particularly appropriate where 
different enzymes are used which would produce 
undesirable products in other molecules. Thus for 

15 example, different nuclease, such as restriction enzymes 
may be used to cleave the first and second nucleic acid 
molecules. In such cases, the molecules are cleaved 
separately, whereafter the enzymes are removed or 
inactivated before the fragments are mixed together with 

20 the adapters. Similarly, even l if the same enzyme is 
used, if the adapters contain enzyme sensitive sites, 
the adapters could be appropriately modified to avoid 
reaction, e.g. by methylation, or the enzymes used to 
fragment the first and/or second nucleic acid molecules 

25 would be inactivated or removed (as mentioned above) 
prior to the addition of the adapters. 

Conveniently, inactivation of enzymes may be 
achieved by incubation at at least 65°C, e.g. for 20 
minutes. Alternatively, appropriate techniques 

3 0 employing removal of the enzymes from the reaction, use 
of chelators, inhibitors etc. may be used to achieve 
inactivation .- 

Once appropriate clones have been generated and 
selected these may be treated according to standard 

35 methods of amplification, transformation, replication, 
expression, sequencing, depending on the proposed 
application of the clones. Other aspects of the 
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invention thus include the nucleic acid molecule product 
of the method (ie. the nucleic acid molecule that is the 
[first nucleic acid molecule fragment] : [adapter] : [second 
nucleic acid molecule] product), such as cloning and 
5 expression vectors comprising that nucleic acid molecule 
product as well as transformed or transfected 
prokaryotic or eukaryotic host cells, or transgenic 
organisms containing a nucleic acid molecule produced 
according to the method of the invention. 

10 Appropriate expression vectors include appropriate 

control sequences such as for example translational 
(e.g. start and stop condon, ribosomal binding sites) 
and transcriptional control elements (e.g. promoter- 
operator regions, termination stop sequences) linked in 

15 matching reading frame with the nucleic acid molecules 
of the invention. Appropriate expression systems are 
well known and documented in the art as well as methods 
for their introduction and expression in prokaryotic or 
eukaryotic cells or germ line or somatic cells to form 

20 transgenic animals. Appropriate expression vectors for 
transformation include bacteriophages and viruses, such 
as baculovirus, adenovirus and vaccinia viruses. 

Kits for performing the methods described herein 
form a preferred aspect of the invention. Thus viewed 

25 from a further aspect the present invention provides a 

kit for attaching a first nucleic acid molecule fragment 
to a second nucleic acid molecule or a fragment thereof 
comprising at least (i) one or more adapters as 
described hereinbefore or means for producing such 

30 adapters, (ii) the second nucleic acid molecule and 

(iii) a nuclease which cleaves outside its recognition 
site, wherein the terminus of one of said adapters has a 
single stranded region complementary to a single 
stranded region generated on said second nucleic acid 

35 molecule after cleavage with said nuclease. 

Preferably said kit comprises a library of 
oligonucleotides, e.g. as described herein, particularly 
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as described in Example 3, from which appropriate 
adapters may be generated. The library of 
oligonucleotides as described herein forms a further 
preferred feature of the invention. Thus for example 
5 said library may comprise a plurality of 

oligonucleotides comprising 1) a plurality of 
oligonucleotides of the formula XNNNNN wherein X is one 
or more bases (wherein said bases are as described 
hereinbefore) and is invariant in all of said 

10 oligonucleotides and each N is a base at the 5' end 

which is varied in the different oligonucleotides, ie. 
to produce 1024 variants, 2) a plurality of 
oligonucleotides of the formula X'NNNN wherein X 1 is 
complementary to X and is invariant in all of said 

15 oligonucleotides and each N is a base at the 5 1 end as 
described hereinbefore, 3) a plurality of 
oligonucleotides of the formula YNNNNN wherein Y, which 
is not the same as X, is one or more bases (wherein said 
bases' are as described hereinbefore) and is invariant . in 

20 all of said oligonucleotides . and each N is a base at the 
3' end as described hereinbefore, and 4) a plurality of 
oligonucleotides of the formula Y'NNNNNN wherein Y ' is 
complementary to Y and is invariant in all of said 
oligonucleotides and each N is a base at the 3 1 as 

25 described hereinbefore. 

Optionally the kit may contain other appropriate 
components selected from the list including ligases, 
enzymes necessary for inactivation and activation of . 
restriction or ligation sites, primers for amplification 

3 0 and/or appropriate enzymes, buffers and solutions, and a 
data carrier containing a computer program to assist. in 
the selection of oligonucleotides from the above 
mentioned library. The use of such kits for performing 
the method of the invention form further aspects of the 

3 5 invention . 

The above described method may be adapted to 
combine multiple first, second, third etc. nucleic acid 
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molecules as described below. In this method multiple 
fragments are combined by appropriate selection of the 
single stranded regions which appear at their ends. 
This has application in the production of specific 
5 sequences for biological purposes, but has particular 
utility in the production of nucleic acid molecule 
chains in which the units making up the chains each 
denotes a unit of information, ie . the chains may be 
used to store information, as will be described in more 

10 detail below. As used herein "chain" refers to a serial 
arrangement of' fragments as described herein. Such 
chains are preferably linear and include branched and 
unbranched fragment sequences. Thus, for example, 
branched DNA fragments may be used to provide chains 

15 with a branched arrangement of fragments. 

To produce nucleic acid molecule chains with 
different unit fragments, ie . fragment chains the 
following method may be used. Firstly it is necessary 
to generate fragments which have overhangs at either 

20 end, to allow them to bind to one another. (The 
ultimate 3 1 and 5 • fragments may however have an 
overhang at only the end which will become attached to 
internal fragments.) As will be described in more 
details below, for certain applications appropriate 

25 oligonucleotides may be derived from libraries in which 
the members exhibit variability in at least some of 
their bases. If libraries are to be produced in which 
the members are double stranded, it will be appreciated 
that the number of members in such a library could be 

3 0 rather high. This can however effectively be reduced by 
using a smaller number of smaller building blocks. 

One strategy is to make two single-stranded 
oligonucleotides using conventional techniques. In the 
example described above (6 base double stranded linker 

35 and 3 base overhangs at either end) , oligonucleotides 
having a region of 6 bases which complement each other 
and so allow hybridization may be used. Since not all 
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of Che molecules are involved in the hybridization, 
single stranded regions extend beyond the hybridizing 
region thus creating single stranded regions . 
Conveniently the number of required library members may 
5 be reduced even further if repeat sequences appear with 
frequency in the fragment chain. This will be described 
in more detail below. 

Once the appropriate double stranded chain units 
(ie. fragments) have been created they may be ligated 

10 together in the same solution, providing the different 
overhangs present on the sequences are unique. 

Thus in a further aspect, the present invention 
provides a method of synthesizing a double stranded 
nucleic acid molecule comprising at least the steps of: 

15 1) generating n double stranded nucleic acid 

fragments, wherein at least n-2 fragments have single 
stranded regions at both termini and 2 fragments have 
single stranded regions at at least one terminus, 
wherein. (n-1) single stranded regions are complementary 

20 to (n-1) other single stranded regions, thereby 
producing (n-1) complementary pairs, 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 
binding of said complementary pairs of single stranded 

25 regions, and 

3) optionally ligating said complementary pairs 
simultaneously or consecutively to produce a nucleic 
acid molecule consisting of n fragments. 

The terms "nucleic acid molecule", "single stranded 
30 regions", "complementary", "binding" and "ligating" are 
as described hereinbefore. 

In step 1) reference is made to (n-1) single 
stranded regions complementary to (n-1) "other" single 
stranded regions. This describes two families of single 
35 stranded regions, which together comprise 2 (n-1) 

members, forming n-1 pairs. Thus "other" refers to 
single stranded regions in the second family which are 
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not present in the first family. 

"Contacting" as used herein refers to bring 
together the double stranded fragments under conditions 
which are conducive to association of the complementary 
5 single stranded regions. Depending on the method used, 
this may ultimately allow ligation of the fragments 
carrying those regions. It should however be noted that 
the fragments may be linked by methods other than 
ligation. For example PGR may be used with appropriate 

10 primers, e.g. pairs of primers. 

Simultaneous or consecutive contacting and/or 
ligation refers to the possibility of adding the 
fragments individually or in groups to a growing chain 
or simultaneously adding all n fragments together, 

15 wherein ligation may be performed after each addition or 
once all n fragments have been combined. Preferably 
ligation is effected once all fragments have been 
combined . 

"Fragments" as used herein are as defined herein 

2 0 before, but preferably are shorter in length. Thus 

fragments are preferably greater than 6 bases in length 
(wherein said length refers to the length of each single 
stranded oligonucleotide making up the fragment which 
may differ slightly in length from one another), e.g. 
25 between 6 and 50 bases, e.g. from 8 to 25 bases. 

As referred to herein, "n" is an integer of at 
least 4, for example at least 10 or 100, e.g. between 25 
and 2 0 0. 

Preferably, as mentioned above, the fragments are 

3 0 generated by the use of single stranded oligonucleotides 

to generate appropriate double stranded molecules . 

Of particular interest in such methods is the 
production of fragment chains that may be used to store 
information in the form of code which may readily be 
35 accessed. 

There is currently a great need for storing 
information for different purposes (e.g. computer 
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software, music, films, databases etc.). It has 
therefore been imperative to find efficient storage 
media, resulting in the development of CD ROMs, DVD 
technology etc. Nucleic acid molecules offer far more 
5 efficient methods for storing information and have 

several advantages over storage methods currently in 
use. For example, the storage capacity of nucleic acid 
molecules is vast. In principle, a test-tube containing 
DNA molecules may contain as much information as several 

10 million CD ROMs or more. Nucleic acid may be copied 

quickly and efficiently using natural systems which are 
greatly enhanced by techniques which have been developed 
such as PCR, LCR etc. When stored appropriately, 
nucleic acid molecules may be preserved for extremely 

15 lengthy periods. Naturally existing tools for 

manipulation of nucleic molecules are already available 
for processing of the molecules, e.g. polymerases, 
restriction enzymes, transcription factors, ribosomes 
etc. The nucleic acid molecules may also have catalytic, 

2 0 properties. 

Furthermore, nucleic acid molecules may be used as 
secure systems since they may be made such that they are 
not readily copied, unlike copying of current storage 
systems, e.g. CDs etc., which is increasingly prevalent. 

25 Previously however, it was not possible to take 

advantage of the enormous potential offered by nucleic 
acid molecules due to the absence of any effective 
methods for writing DNA messages or reading DNA 
messages. The above described method provides methods 

30 which overcome this problem allowing the rapid synthesis 
of large DNA molecules and methods of rapidly and 
efficiently scanning those molecules to retrieve the 
information. 

The key to effective retrieve of information 

3 5 encoded by the nucleic acid molecules produced according 

to the method described herein, is the . expansion of the 
information providing unit in the molecule. In nature 
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and in methods used previously, each base in the 
sequence has an individual informational content. 
Indeed methods have been described in which a single 
base may signify more than a single informational unit, 
5 e.g in binary code, the bases A="00", C=" 01" , G="10" and 
T="ll". Whilst this has advantages insofar as 
significant amounts of information can be contained in a 
single molecule, the system has serious drawbacks as it 
requires writing and reading methods in which individual 

10 bases may be attached and discriminated. 

In a preferred method of the invention therefore, 
information units are provided which are not single 
bases, but are instead short sequences. The techniques 
described above allow the rapid production of such 

15 chains and the information may be readily accessed. 

Thus units representing coded information may be 
generated and read. Each information unit may therefore 
represent an element of code, in which the code may for 
example be alphanumeric code or a simpler representation 

20 such as binary code. In each case it is necessary for 
individual elements of the code, e.g. "a", "b" , "c", 
"1", " 0" etc. to be represented by an individualized and 
specific sequence. 

As used herein "information units" refer to 

25 discrete short sequences which represent a single piece 
of information, e.g. one or more (ie. combinations 
thereof) elements of a code. 

"Elements" of code, as mentioned above, refer to 
the different members making up a code such as binary or 

3 0 alphanumeric code. 

Thus, in a preferred embodiment of the method of 
the invention, the fragments which are linked together 
comprise regions representing a unit of information 
corresponding to one or more code elements. Preferably 

35 the code is alphanumeric. Especially preferably the 

code is binary. Thus for example, considering a binary 
system of information capture, if one wishes to produce 
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chains consisting of "0", "1" fragments, appropriate 
sequence combinations may be attributed to n 0" or "1" . 

Conveniently each of said one or more code elements 
(together) has the formula 
5 (X) a , 
wherein 

X is a nucleotide A, T, G, C or a derivative 
thereof which allows complementary binding and may be 
the same or different at each position, and 
10 a is an integer greater than 2, e.g. greater than 

4, for example' from 2 to 20, preferably from 4 to 10, 
e .g . 6 to 8 , 

wherein (X) a is different for each one or more code 
elements . 

15 Especially preferably, in the case of binary code, 

the code elements "1" and "0" may have the formulae: 

»0"= (X) a and "1"= (Y) b , 
wherein 

20 (X) a and (Y) b are not identical, 

X and Y are each a nucleotide A, T, G, C or a 
derivative thereof which allows complementary binding 
and may be the same or different at each position, and 
a and b are integers greater than 2, e.g. greater 

2 5 than 4, for example from 2 to 20, preferably from 4 to 

10, e.g. 6 to 8. 

As referred to herein, a "derivative" which is 
capable of complementary binding refers to a nucleotide 
analog or variant which is capable of binding to a 
30 nucleotide present in a complementary strand,, and 

includes in particular naturally occurring or synthetic 
variants of nucleotides, e.g uracil or methylated, 
amidated nucleotides etc. 

In its simplest and preferred form, X and Y are the 

3 5 same at each position, e.g. "0"= GGGGGGGG and 

" 1 " = AAAAAAAA . However, repeat sequences such as [AC] 6 A 
or [GT] 6 A may be used. The code sequence may also have a 
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functional property, e.g. it may be an integration 
element such as AttPl or AttP2. 

It will however be appreciated that the sequences 
described above may also denote more than a single code 
5 element. Thus for example the information unit may 
denote 2 or more code elements, e.g. from 2 to 3 2 
element, preferably from 2 to 4 code elements. If for 
example binary code is considered, each information unit 
may refer to "01" or "00" or "11" or "10". 

10 In the method described herein, chains comprising 

such features may be prepared as follows. To produce a 
chain with for example 8 0/1 fragments, eight "O" 
starting fragments with different overhangs and 8 "1" 
starting fragments with different overhangs are 

15 generated as illustrated in Figure 2. In this case " 0 M 
fragments consist of the sequence GGGGGGGG, although 
this could be replaced by other sequences. In addition 
the fragments are synthesized such that they have unique 
overhangs such that they may only be ligated at one 

20 position. Thus, the fragments for position 1 in the 

chain are produced such that they have an overhang which 
is complemented by one of the overhangs in the fragments 
for position 2. Thus, the position 2 fragments are 
synthesized such that they can bind to position 1 

25 fragments. Similarly position 3 fragments may only bind 
to position 2 fragments at one of their termini and 
position 4 fragments at the other terminus and so forth. 
These fragments are stored separately. In order to build 
up a chain, selection is made from one of the two 

3 0 alternative for each position such that an appropriate 
binary chain is produced. 

Thus, in the scheme outlined above, to produce a 
fragment chain which represents a chain 01001011, "0" 
fragments from positions 1, 3, 4 and 6 are mixed with 

3 5 "1" fragments from positions 2, 5, 7 and 8. If the 

fragments are then ligated together by adding ligase or 
using other ligation methods mentioned previously, the 
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above described chain will be produced. As will be 
appreciated, this chain could also be achieved using for 
example only 4 fragments if the information unit carried 
on each fragment denoted 2 code elements. 
5 It is furthermore possible to combine intermediate 

fragment chains (e.g. containing at least 4 fragments) 
with other fragment chains, which providing appropriate 
overhangs exist at their termini may be ligated together 
to form composite fragment chains. Thus, several cycles 
10 could be conducted in parallel and the products 

combined. In the method shown in Figure 2, the end 
fragments have blunt ends, but clearly, appropriate 
fragments could be used that similarly have overhangs at 
the termini . 

15 An appropriate technique for producing 8 fragment 

chains, each containing 8 fragments which can then be 
ligated together is illustrated in Figure 3. For 
fragment chain 1, end fragments are used such that it is 
possible for the completed fragment chain to ligate to 

20 fragment chain 2 and so on. These may then be combined 
to produce a 64 fragment chain. Similarly, 8 such 
fragment chains may be combined to produce fragment 
chains comprising 512 fragments. 

As will be appreciated, as with the production of 

25 shorter chains, the step of ligation, when performed, is 
conveniently effected once all the fragment chains have 
been combined. However, the step of ligation may be 
performed sequentially if desired on addition of each 
subsequent fragment chain. 

3 0 To combine 8 binary fragments per cycle, 16 

different starting fragments are required, representing 
the different "0", "1" alternatives at each position. 
To make a chain of 64 fragments using two cycles, ie . to 
produce 8 chains with 8 fragments which are then 

35 ligated, .only 16+ (4x7) =44 starting fragments are 
required. Thus, the number of different starting 
fragments required reflects an almost linear increase in 
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contrast to the combinations of the fragment chains 
which can be produced which increases exponentially with 
the number of cycles. As a consequence, very long 
fragment chains may be produced with a relatively srrall 
5 number of starting fragments. 

Of course, as mentioned previously, intermediate 
chains longer or shorter than 8 may be produced. Since 
a large number of permutations exist in the overhang 
region, more starting fragments may be used thus 

10 allowing larger fragments to be built up in a single 

cycle. Thus, the number of cycles necessary to produce 
long chains may be reduced. 

Small fragment chains produced according to the 
methods described herein may also be attached together 

15 by using variations of the techniques described herein. 
For example, complementary primer pairs may be used to 
link the various chains as described in Example 8 . In 
this technique, amplification of the fragment chains is 
achieved using different primer pairs. The second 

20 primer in primer pair 1 is complementary to the first 
primer in primer pair 2 and the second primer in that 
pair is complementary to the first primer in primer pair 
3 and so on. PCR reactions are then performed which 
produce products which in single stranded form are able 

25 to bind to one another through their complementary ends 
introduced by the primer pairs. These may then be 
ligated together. 

Alternatively, fragment chains prepared by the 
methods described herein may be amplified with a primer 

30 which contains a restriction site to a nuclease which 
cleaves outside its recognition site. These 
amplification products are then digested with that 
nuclease to produce non -palindromic overhangs in the end 
of each fragment chain. By appropriate sequence 

35 selection (e.g. in the primer or fragments which are 
used) the overhangs which are generated allow the 
different fragment chains to be combined in order. 
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In a preferred aspect therefore, the invention 
provides a method of synthesizing a double stranded 
nucleic acid molecule comprising at least the steps of : 

1) generating fragment chains according to the method 
5 described hereinbefore; 

2) optionally generating single stranded regions at 
the end of said fragment chains, wherein said single 
stranded regions are complementary to other single 
stranded regions on said fragment chains thus forming 

10 complementary pairs of single stranded regions; 

3) contacting said fragment chains with one another, 
simultaneously or consecutively, to effect binding of 
said complementary pairs of single stranded regions. 

Optionally said chains are ligated together, 
15 however, alternative techniques may be use to form the 
ultimate chain, e.g. PCR may be used as described 
herein. 

Preferably intermediate fragment chains are between 
4 and 20 fragments in length, e.g. 5 to 10, and between 
20 5 and 50 such fragment chains are combined e.g. between 
10 and 20. 

Conveniently fragments to be used in the method of 
the invention are contained within libraries. Methods 
of producing the fragments which make up the library are 

25 well known in the art. For example a series of 

oligonucleotides may be produced which comprise two 
portions. A first portion which will form an overhang 
at one end and a second portion which will effect 
binding to a complementary oligonucleotide and which 

30 contains within that portion the information unit. By 
producing common hybridizing portions and variant 
overhangs, a series of double stranded oligonucleotides 
for one or more code elements (denoted by at least a 
part of the hybridizing portion) are created. This 

35 provides a library for one (or a combination of) code 
elements. Different libraries may be created for 
different code elements (or combinations thereof) , by 
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appropriate alteration of the information unit, ie . the 
sequence in the hybridizing portion. 

Conveniently for use in the invention, these 
different double stranded oligonucleotides are arranged 
5 in 2 dimensional arrays such that in one dimension 

consecutive positions within the ultimate fragment are 
indicated and in the second dimension the possible code 
element (or combinations thereof) are provided. In the 
simplest case, in binary code, in which "0" and "1" are 

10 represented by different sequences, the first dimension 
would comprise fragments for each position of the 
proposed fragment and the second dimension would have 
only 2 variants ("0" and "1" ) . This may be viewed as a 
single library or two libraries, ie . the "0" or "1" 

15 libraries. Once these libraries are produced, fragment 
chains with any desired order of fragments may be 
readily produced. 

In order to appropriately direct library members to 
their correct site or well (ie. the library may be 

20 comprised of separate solid supports, or a solid support 
with different addresses, e.g. wells, or different wells 
containing different solutions) , any appropriate sorting 
technique may be used. This sorting may be achieved by 
virtue of the process used for production of the library 

25 members, or sorting may be achieved by an appropriate 
technique, e.g. by binding to complementary 
oligonucleotides at the relevant library site. 

Appropriate solid supports suitable for attaching 
library members are well known in the art and widely 

30 described in the literature and generally speaking, the 
solid support may be any of the well-known supports or 
matrices which are currently widely used or proposed for 
immobilization, separation etc. in chemical or 
biochemical procedures. Thus for example, the 

35 immobilizing moieties may take the form of beads, 

particles, sheets, gels, filters, membranes, microfibre 
strips, tubes or plates, fibres or capillaries, made for 
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example of a polymeric material e.g. agarose, cellulose, 
alginate, teflon, latex or polystyrene. Particulate 
materials, e.g. beads, are generally preferred. 
Conveniently, the immobilizing moiety may comprise 
5 magnetic particles, such as superparamagnetic particles. 

In a preferred embodiment, plates or sheets are 
used to allow fixation of molecules in linear 
arrangement. The plates may also comprise walls 
perpendicular to the plate on which molecules may be 

10 attached. Attachment to the solid support may be 

performed directly or indirectly and the technique which 
is used will depend on whether the molecule to be. 
attached is an oligonucleotide for fixing the library 
member or the library member itself. For attaching the 

15 library members directly, ie. not via binding to an 
oligonucleotide, conveniently attachment may be 
performed indirectly by the use of an attachment moiety 
carried on the nucleic acid molecules and/or solid 
support. Thus for example, a pair of affinity binding 

2 0 partners may be used, such as -avidin, streptavidin or 

biotin, DNA or DNA binding protein (e.g. either the lac 
I repressor protein or the lac operator sequence to 
which it binds), antibodies (which may be mono- or 
polyclonal) , antibody fragments or the epitopes or 
25 haptens of antibodies. In these cases, one partner of 
the binding pair is attached to (or is inherently part 
of) the solid support and the other partner is attached 
to (or is inherently part of) the nucleic acid 
molecules. Alternatively, techniques of direct 

3 0 attachment may be used such as for example if a filter 

is used, attachment may be performed by UV- induced 
crosslinking. When attaching DNA fragments, the natural 
propensity of DNA to adhere to glass may also be used. 
Oligonucleotides to be used for capture of the 
3 5 library members may be attached to the solid support via 
the use of appropriate functional groups on the solid 
support . 
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Attachment of appropriate functional groups to the 
solid support may be performed by methods well known in 
the art, which include for example, attachment through 
hydroxyl , carboxyl , aldehyde or amino groups which may 
5 be provided by treating the solid support to provide 
suitable surface coatings . Attachment of appropriate 
functional groups to the nucleic acid molecules of the 
invention may be performed by ligation or introduced 
during synthesis or amplification, for example using 

10 primers carrying an appropriate moiety, such as biotin 
or a particular sequence for capture. 

In a further aspect therefore the present invention 
provides a library of fragments as defined herein 
comprising (n) m fragments, wherein n is as defined 

15 hereinbefore and corresponds to the length of chain that 
said library may produce, and m is an integer 
corresponding to the number of possible code elements or 
combinations thereof, such that fragments corresponding 
to all possible code elements for each position in the 

20 final chain are provided. 

Portions of said libraries in one dimension, ie . 
comprising n fragments for only a single code element 
(or combinations thereof) or comprising m fragments 
representing all code elements (or combinations thereof) 

25 for a single position on the chain, form further aspects 
of the invention. 

Appropriate mixing may be achieved by automation. 
For example in the case of "0", "1" fragments, the 
correct combination of these elements is the critical 

30 step in terms of resource- and time-consumption. This 
method is described in more detail in Example 2 . In 
particular, the procedure may be miniaturised providing 
appropriate amplifying methods (such as cloning and/or 
PCR) are employed in the last step. Thus, techniques 

35 using technology such as sorting using flow cytometers 

may be employed as described in Figure 4C. Such sorting 
procedures are well established and are able to sort 
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approximately 5-30000 droplets per second for standard 
equipment, but up to 300000 droplets per second for the 
most advance cytometers . 

As mentioned previously, it is possible that each 
5 fragment may denote more than a single code element. If 
for example, each fragment denotes 5 code elements, 
using existing technology and a library of 32x100 
library components, if 3200 containers were connected to 
a sorting device illustrated in Figure 4C, it should be 
10 possible to write several thousand chains with 500 code 
elements per second. Clearly, a method which can 
generate nucleic acid sequences with such rapidity 
offers significant advantages over known methods in the 
art . 

15 The nucleic acid molecule (ie. the fragment chain) 

produced according to the above described method and the 
single stranded molecules thereof comprise further 
features of the invention. These molecules may as 
appropriate be- included into a vector, as described 

20 hereinbefore. 

Once produced, the fragment chains, in double 
stranded or single stranded form, may be used in various 
applications, as described hereinafter. One application 
of particular utility is to store information. In such 

25 cases appropriate means of reading the information 
stored in those chains is required. In some 
applications, fragment chains may be appropriately 
addressed to particular sites, e.g. through binding to 
oligonucleotides carried on solid supports which are 

3 0 complementary to overhangs on one terminus of the 
fragment chains. Alternatively appropriate 
antibody/antigen, or DNArprotein recognition systems may 
be used. Thus, information stored in molecules 
addressed in this way, or in solution may then be 

3 5 accessed. 

Co-pending application PCT/GB99/04417 , a copy of 
which is appended hereto, describes appropriate 
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techniques for addressing and reading information 
contained in nucleic acid molecules. Of particular note 
in this respect are techniques in which fluorescence of 
probes carrying fluorescent labels directed to 
5 particular sequences are detected. In such techniques, 
probes, carrying labels as described hereinbefore, may 
be directed to particular fragment regions, particularly 
to regions denoting code elements. The signals 
generated (directly or indirectly) by those labels may 

10 then be detected and the code element thereby 

identified. If a simple binary system is used only 2 
discrete labels are required and their pattern of 
binding may be determined. Alternatively, if a more 
complex code is reflected in the fragment chains, 

15 correspondingly more discrete labels are required for 
unambiguous detection . 

Thus in a further aspect, the present invention 
provides, a method of identifying the code elements 
contained in a nucleic acid molecule prepared as 

20 described hereinbefore (ie. fragment chain) wherein a 
probe, carrying a signalling means (e.g. a label), 
specific to one or more code elements, is bound to said 
nucleic acid molecule and a signal generated by said 
signalling means is detected, whereby said one or more 

25 code elements may be identified. 

Preferably said signalling means is a label as 
described hereinbefore . 

A "probe" as referred to herein refers to an 
appropriate nucleic acid molecule, e.g. made up of DNA, 

3 0 RNA or PNA sequences, or hybrids thereof, which is able 
to bind to the target nucleic acid molecule (which may 
be single or double stranded) through specific 
interactions, ie. is specific to particular code 
elements, e.g. through complementary binding to a 

35 particular sequence. Probes may be any convenient 

length, to allow specific binding, e.g. in the order of 
5 to 50 bases, preferably 8 to 20 bases in length. 
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A " signalling means 11 as used herein refers to a 
means for generating a signal directly or indirectly. A 
signal may be any physical or chemical property which 
may be detected, e.g. presence of a particular product, 
5 colour, fluorescence, radiation, magnetism, 

paramagnetism, electric charge, size, or volume. 
Preferably the label is a fluorophore whose florescence 
is detected. In such cases fluorescence scanners may be 
used for detection of the label and thereby 

10 identification of the code elements. 

A particular code element or combination of 
elements may be identified by the appearance of a 
particular signal. Clearly the position of each signal 
is crucial to determining the sequence of the code 

15 elements. As a consequence methods in which positional 
information (absolute or relative) may be obtained 
should be used. Appropriate techniques, e.g. using 
target molecules which have been attached to a solid 
support at one end, are described in co-pending 

20 application PCT/GB99/04417. 

A number of applications exist for the fragment 
chains once produced in nano and pico- technology , inter 
alia for example by stretching of the fragment chains by 
means of a stream of liquid, electricity or other 

25 technology and using them as templates for nano and 

pico-structures. The products may also be used to label 
products which can then be screened to establish their, 
identity. Alternatively, the molecules may be used to 
store information, e.g. pictures, text, music or as data 

30 storage in DNA computers. The rapid production and 

reading techniques makes such applications possible for 
the first time. 

Kits for performing the methods described above 
form a preferred aspect of the invention. Thus viewed 

35 from a further aspect the present invention provides a 
kit for synthesizing a double stranded nucleic acid 
molecule comprising at least n double stranded nucleic 
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acid fragments, wherein at least n-2 fragments have 
single stranded regions at both termini and 2 fragments 
have single stranded regions at at least one terminus, 
wherein (n-1) single stranded regions are complementary 
5 to (n-1) other single stranded regions, thereby 

producing (n-1) complementary pairs. Preferably in 
excess of n fragments are supplied for production of a 
chain of n fragments, such that selection of appropriate 
fragments for different positions is possible. Thus in 

10 a preferred feature said kit comprises (n) m fragments , 
wherein n is as defined hereinbefore, and m is an 
integer corresponding to the number of possible 
variations, e.g. unique sequences or code elements or 
combinations thereof, such that fragments corresponding 

15 to all possible sequences or code elements for each 

position in the final chain are provided. Preferably 
these fragments are provided in appropriate libraries 
arranged with reference to their position within the 
fragment chain and the code element (s) which they 

20 represent, such that desired fragments may be readily 
selected from the array. 

Optionally the kit may contain other appropriate 
components selected from the list including ligases, 
enzymes necessary for inactivation and activation of 

25 restriction or ligation sites, primers for amplification 
and/or appropriate enzymes, buffers and solutions. The 
use of such kits for performing the method of the 
invention form further aspects of the invention. 

3 0 The following examples are given by way of illustration 
only in which the Figures referred to are as follows: 

Figure 1 shows a schematic representation of how the 
method of the invention may be used to introduce an 
35 insert into a vector, in which the insert is cleaved 
from the first nucleic acid molecule, associated with 
adapters and ligated thereto and then ligated into the 



BNSDOCID: <WO 0100816A1_I_> 



WO 01/00816 



PCT/GBOO/02512 



- 45 - 

vector ; 

Figure 2 shows the production of a fragment chain using 
8 "0" and "l" starting fragments with different 
overhangs ; 

5 Figure 3 shows the production of a 64 fragment chain in 
which 8 chains are produced comprising 8 fragments each, 
in which the termini of chains 1 and 2, and 2 and 3 etc. 
are complementary such that they may be ligated 
together; 

10 Figure 4 shows 3 techniques for mixing "0", "1" 

fragments from' a library of fragments ordered for each 
position, in which in A) appropriate fragments are 
selected by aspiration from appropriate wells, B) 
appropriate fragments are released from the library 

15 wells and C) a flow cytometer is used to direct 
appropriate droplets to the mixing chamber ; 
Figure 5 shows PCR amplification of signal chain 
1-0-1-0-0 using SP6 and T7 primers. Lane 1: 1/zg of 1 kb 
DNA ladder (Gibco BRL) , Lane 2: .10 jil o f PCR amplified 

20 fragment chain DNA using SP6 and :T7 primers. Lane 3: 
Same as lane 2 except for the use* of SP6 and T7-Cy5 
primers ; and 

Figure 6 shows the use of primer pairs during the 
process of amplification to join together fragment 
2 5 chains . 
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EXAMPLE 1: CLONING OF AN INSERT INTO A VECTOR. FOR 
EXAMPLE FROM PHIX17 4 INTO PUCJ.9 

A general procedure to be followed using IIS and IP 
5 enzymes to achieve cloning involves the use of a cloning 
vector which has the following characteristics: 
1) A multiple cloning site located within a gene 
(lacZ, ccdB or other) that allows the detection of 
successful insertion . 

10 2) The multiple cloning site contains two flanking 
Hgal sites that generates overhangs that differ from 
other Hgal generated overhangs elsewhere in the vector. 
The orientation of the Hgal sites ensures excision of 
its sites from the vector part during digestion. To 

15 minimize background due to undigested plasmids, several 
Hgal sites and other suitable restriction enzyme sites 
are included in the MCS . The restriction enzymes are 
chosen such that they cleave well in Hgal buffer and do 
not have other sites in the vector. 

20 

The donor plasmid is cut with the appropriate set of IIS 
and/or IP enzymes. Adapters are used to specify the 
fragment to be sub-cloned into the vector, by the use of 
appropriate single stranded regions on the adapters to 
25 the overhangs generated on the insert. This results in 
the molecule: vector - adapter 1 - insert (e.g. PhiX174 
gene) - adapter II - vector. 

This method is illustrated for insertion of a PhiX174 
30 insert into a vector, e.g. pUC19. An Hgal site in a 

pUC19 plasmid is chosen randomly to be our "polyl inker " 
while different genes and gene combinations from the 
PhiX174 genome is used as "inserts". 

35 Genomes are organized in PhiX174 as illustrated below 

which shows the position of genes A, B, C and E relative 
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to one another: 



























r. _p_ i 















In the above, gene B is located inside gene A while gene 
10 C is slightly overlapping with gene A (by 3 base pairs) . 
G ene p and K are located in the same area as gene C and 
E, but are not shown. This genome area contains 9 BJbvI 
sites as shown on the bottom row, in which the overhang 
pairs that will be generated by cutting with Bbvl are as 
15 follows with the base pair position indicated in 

brackets: 1-CAGC/GTCG (3798), 2-CTGC/GACG (4215), 3- 
ACGG/TGCC (43 98) , 3-GCAT/CGTA (4 677) , 5-CTAT/GATA 
(5049)/ 6-GAGA/CTCT (158), 7-GAGC/CTCG (547), 8- 
CAAC/GTTG (624), 9-CCAT/GGTA (892). The parts of the 
20 PhiXl 74 genome not shown contain 5 more BJbvI sites: 10- 
TACC/ATGG (1488), 11-TACC/ATGG (1592), 12-CTAC/GATG 
(1639), 13-GCAC/CGTG (3294), 14-CTAA/GATT (3297). Of 
these only 12 give rise to non- identical overhangs 
whilst 2 result in identical overhangs. 

25 

When Hgal is used to cleave pUC19, 4 non- identical sites 
are cleaved, giving rise to 8 non- identical overhangs. 
These are: 1-CTGCC/GACGG (573), 2 - TTCTC / AAGAG (1131), 3- 
CAAGG/GTTCC (1881) , 4 - AGACT/TCTGA (2459) . 

30 

MethPd; 

To sub-clone gene B from Bacteriophage PhiX174 into the 
designed vector, the following protocol is used: 

35 1) 2/ig of PhiX174 DNA is digested with 2 U of BJbvI (NEB) 
in IX buffer 2 (NEB) , water added to a volume of 20/il, 
for 1 hr at 37°C. BJbvI is then heat inactivated at 65°C 
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for 20 minutes . 

2) 2fig of vector (e.g. pUC19) is digested with 2 U Hgal 
(NEB) in IX buffer 1 (NEB) , water added to a volume of 
20/xl, for 1 hr at 37°C. Hgal is then heat inactivated at 

5 65°C for 20 minutes. 

3) The adapters are made in separate tubes by mixing 
two and two oligonucleotides (selected to obtain the 
desired product, ie. particular genets) , in 
forward/reverse orientation) and allowing annealing. 

10 4) of the cleavage reaction of PhiX174 is mixed 

with 3/xl of the cleavage reaction of the vector and 
ligated in the presence of 5-50 pmol of each adaptor, 
2-10 U//xl T4 DNA Ligase (NEB), IX ligase buffer (NEB) 
and 5% Polyethylene glycol 8000, water added to a volume 

15 of 30/xl, at 25°C for 1 hr . 

5) Conventional methods are used to transform bacteria. 

6) The colonies are then counted and some of them are 
then picked for further analysis (sequencing, and the 
like) . 

20 

Materials : 

Oligonucleotides used to address PhiX174 overhangs: 
BJbvI overhang la: 

5 1 - CGA GCG CCT CCA GTG CAG CGG AG 

2 5 BJbvI overhang 5a : 

5 ■ - TATC GCG CCT CCA GTG CAG CGG AG 
Bbvl overhang 6b : 

5 1 - CTCT GCG CCT CCA GTG CAG CGG AG 
BJbvI overhang 6 (delC) : 

3 0 5 1 - CTCT CTC CGC TGC ACT GGA GGC GC 

BJbvI overhang 7a: 

5 1 - CAAC GCG CCT CCA GTG CAG CGG AG 
BJbvI overhang 9b : 

5 ! - GGTA GCG CCT CCA GTG CAG CGG AG 

35 
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Oligonucleotides used to address pUC19 overhangs: 
Cloning site la 

5 ■ - AAGAG CTC CGC TGC ACT GGA GGC GC 

Cloning site lb 
5 5'- CTCTT CTC CGC TGC ACT GGA GGC GC 

Two important advantages with this recombination-method 
over the classical Cohen-Boyer method should be noted. 
The procedure is very easy to perform. It involves only 
10 mixing and incubation steps before transformation. No 
PCR-amplif icat'ions or gel separations are required. 
The methods gives significant flexibility and allows 
complex recombinations to be made even with only two 
restriction enzymes. 

KXAMPTjE 2: ATTTOMATIO N AND MINIATURISATION OF CHAIN 

This method describes a rapid process for mixing 
20 appropriate "0" and "1" fragments with the correct 

overhangs to produce a particular string consisting of 
"0" * s and "l ,M s. 

Two libraries are produced, one with "0" fragments and 
25 one with "1" fragments. As mentioned in the 

description, these are generated with overhangs that can 
be ligated to corresponding overhangs for fragments at 
adjacent positions. These separate members are present 
in separate wells to form the library, such that 
30 position 1 fragments are present in well 1, position 2 
fragments are present in well 2 and so forth. The two 
libraries thus provide the alternatives for each 
position. In order to generate the chain therefore it 
is only necessary to select the correct fragment "0" or 
35 »i" for position 1, and then position 2 etc. Since 
these fragments, as a consequence of their unique 
overhangs, may only hybridize to fragments for adjacent 
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positions, it is necessary only to select the correct 
fragments, then mix and ligate those fragments 
simultaneously. Different ways of achieving this effect 
are shown in Figure 4 which shows three different 
5 alternatives for mixing. 

In Figure 4A, e.g. to produce the chain 0-1-0-0-1, the 
apparatus is used to aspirate from the "0 M library at 
positions 1, 3 and 4, and aspirate from the " 1" library 

10 at position 2 and 5. The liquids that have been 

aspirated may then be mixed together with ligase and an 
appropriate buffer. In alternative B, each well in the 
library is connected with a tube/nozzle that may be 
closed/opened electronically. Liquid from the nozzles 

15 is directed into the ligation chamber together with 

ligase and an appropriate buffer. Different chains may 
be constructed by appropriately changing the pattern of 
nozzles which are opened/closed. 

20 The procedure may also be miniaturised, e.g. using flow 
cytometry technology as illustrated in Figure 4C. In 
this method, library components are stored in containers 
on top of the "writing-machine". Droplets from each 
container are then guided either to the waste or 

25 production well depending on the nature of the chain 

that is to be constructed. The guiding mechanism is as 
used in ordinary flow cytometers, ie . the droplets are 
charged when they leave the container and may be guided 
electronically in different directions. 

30 

KXAMPT.K 3 - LIBRARIES COMPRISING OLIGONUCLEOTIDES FOR 
USE IN THE INVENTION 

Conveniently, the cloning method may be performed using 
35 libraries containing oligonucleotides. For example a 
library may contain: 
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1. oligonucleotides with a common portion and 5 bases 
at the 5 1 end which vary to provide all possible 
permutations, ie . 1024 variants. 

2.. Oligonucleotides with a common portion and 4 baseis 
5 at the 5 1 end which vary to provide all possible 
permutations, ie . 256 variants. 

3. Oligonucleotides with a common portion and 5 bases 
at the 3 1 end which vary to provide all possible 
permutations, ie . 1024 variants. 
10 4 . Oligonucleotides with a common portion and 6 bases 
at the 3 end which vary to provide all possible 
permutations, "ie . 4096 variants. 

In the above, the oligonucleotides are produced such 
15 that all "1" oligonucleotides are complementary to "2" 
oligonucleotides by virtue of the invariant bases, ie . 
to generate a double stranded molecule with variant 4/5. 
base overhangs. Similarly "3" and M 4" oligonucleotides, 
are complementary. 

20 < . 

Oligonucleotides combined in this -way (ie.with. 
overhangs at either end of 4-6 bases may also be 
combined together with complementary double stranded 
oligonucleotides also generated by combining certain 

25 members of the library. In this way variable overhangs 
of different lengths may be created in the resultant 
molecule, e.g. a molecule with a 4 base overhang at both 
the 3 ■ and 5 1 end . 

30 Oligonucleotides may also be provided in the library 

which allow 5' and 3' adapters to be linked. Thus for 
example oligonucleotides having the following form may 
be provided: 

5. 5 » -AAAA- [ compl ] -FFFFF- 3 ' 
.35 6 . 5 1 -DDDDD- [ compl ] -FFFFF- 3 ' 

7. 5' -AAAA- [ compl ] - HHHHHH - 3 ' 

8 . 5 ' -DDDDD- [ compl ] - HHHHHH - 3 ' 
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9 . 3 1 - [ compl* ] -5 ' 

10. 5 1 -BBBB- [ comp2 ]-3' 

11. 5 1 -EEEEE- [ comp2* ]-3 
12 . 5 1 - [ comp3 ] -GGGGG-3 » 
13. 5'-[ comp3* J-IIIIII-3 



in which "compx" refer to a region which is 
complementary to region "compx*", ie . "5", "6", "7" or 
"8" can bind to "9". Furthermore, " comp2 ,! can bind to 

10 oligonucleotide 1 above, " comp2a" can bind to 

oligonucleotide 2, "comp3" can bind to oligonucleotide 
"4" and "comp3*" can bind to oligonucleotide "3". The 
bases denoted "A" bind to "B", ie . "7" and "10" can bind 
at their ends. Similarly "D" binds to "E", "F" binds to 

15 "G" and "H" binds to "I". (These bases when together 

may have a variable content, e.g. AAAA=GAGA and then 
BBBB=TCTC. ) 

By appropriate use of the linkers described above, 5' 
20 and 3' adapters may be combined. For example, 

oligonucleotide "2" with a particular 4 base 5' overhang 
may be bound through its complementary region to an 
oligonucleotide linker "11" which will then leave a 
" EEEEE" overlap. This may be bound to oligonucleotide 
25 "8" through the overlap which may itself bind 

oligonucleotide "9" through its complementary region. 
The overlap "HHHHHH" may be bound to oligonucleotide 
"13" which may attach an oligonucleotide "4" through 
binding to the complementary region. Thus various 
30 permutations may be made which result in various overlap 
lengths, e.g. any combination of 4 , 5, or 6 base 
overlaps which may on the same or different strands. 

PYAMPTiF. 4 - T RIMMING PROCEDURE FOR GENERATING UNIQUE 
35 OVFiflHANgS 

The system presented here makes it possible to perform a 
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trimming procedure with seven different IIS enzymes that 
make 5' 4 base overhangs (Fokl and Bs.t7ll), 5' 5 base 
overhangs (Hgal) , 3' 5 base overhangs (BplI and Bael) 
and 3* 6 base overhangs (Cj'el and HaelV) . If the 
oligonucleotide system presented here is combined with 
the basic oligonucleotide kit described in Example 3, 
all permutations of 3 1 5 base and 6 base overhangs and 
all permutations of 5 1 4 base and 5 base overhangs can 
be addressed for the trimming procedure . 

in this Example, the location of the binding motifs of 
the initiation linkers is shown below: 

FOKl ---GGATGr--- 

15 Bst71I --GCAGC 



10 



30 



3 5 



HgaI GACGC 

BplI GAG CTC 

Bael CYATG CA 

Cjel • -- CCA GT 

20 HaelV --GAY RTC 

Consensus - - GCAGCGACCATGAGTCCA - CTC — GTGGATGACGC 

Initiation linkers: 



25 



X=0 : 


5 ' 


- - GCAGCGACCATGAGTCCA - 


CTC- 


-GTGGATGPPPPPP 




3 ' 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


-CACCTAC 


X=l : 


5 ' 


- - GCAGCGACCATGAGTCCA - 


CTC- 


-GTGGATG-PPPPPP 




3 ' 


- - CGTCGCTGGTACTCAGGT - 


GAG- 


- CACCTAC - 


X=2 : 


5 ' 


- - GCAGCGACCATGAGTCCA - 


CTC- 


-GTGGATG- -PPPPPP 




3 ' 


- - CGTCGCTGGTACTCAGGT - 


•GAG- 


-CACCTAC- - 


X=3 : 


5 ' 


- -GCAGCGACCATGAGTCCA - 


-CTC- 


-GTGGATG PPPPPP 




3 ' 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


-CACCTAC 


X=4 : 


5 ' 


- -GCAGCGACCATGAGTCCA - 


^CTC- 


-GTGGATGACGCPPPPPP 




3 ' 


- - CGTCGCTGGTACTCAGGT - 


-GAG- 


- CACCTACTGCG 


X=5: 


5 1 


- - GCAGCGACCATGAGTCCA - 


-CTC- 


-GTGGATGACGC- PPPPPP 




3 ' 


- -CGTCGCTGGTACTCAGGT- 


-GAG- 


-CACCTACTGCG- 



X=6 : 5 — GCAGCGACCATGAGTCCA- CTC - - GTGGATGACGC - - PPPPPP 
3 1 - - CGTCGCTGGTACTCAGGT - GAG - - CACCTACTGCG - - 
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X=7 : 5 1 - - GCAGCGACCATG AGTCCA - CTC - - GTGGATGACGC PPPPPP 

3 1 - - CGTCGCTGGTACTCAGGT - GAG - - CACCTACTGCG 

X = 8 : 5 ' - - GCAGCGACCATG AGTCCA- CTC - - GTGGATGACGC PPPPPP 

3 " - - CGTCGCTGGTACTCAGGT - GAG - - CACCTACTGCG 

5 X = 9 : 5 ' - -GCAGCGACCATGAGTCCA- CTC - - GTGGATGACGC PPPPPP 

3 . _ _ CGTCGCTGGTACTCAGGT - GAG - - CACCTACTGCG 

The 6 base 3 ' overhang PPPPPP is a non-palindromic 
sequence that can be ligated with the complementary 

10 overhang QQQQQQ. The reason 10 different initiation 

linkers are needed is because Bael cuts 10 bases away 
from its binding site. These linkers therefore allow a 
trimming procedure where Bael "jumps" 10 bases for each 
trimming cycle. 10 different start positions will then 

15 be necessary to cover all possibilities. On the other 

side, Hgal cuts only 5 bases away, only necessitating 5 
different start positions. This is the reason the 
binding site for Hgral is not present on X=0 - X=3 , 
above . 

20 

Propagat ion 1 inker s : 

Fo Jcl : 5 ' GGATG 

3 » CCTACNNNN 

Bst71I : 5' GCAGC 

2 5 3' CGTCGNNNN 

tfgal: 5 ' GACGC 

3 ' CTGCGNNNNN 

Bpl I: 5 1 GAG CTCNNNNN 

3 i CTC GAG 

30 Bael: 5 1 CCATG CANNNNN 

3 i GGTAC GT 

HaelV : 5 1 GAC GTCNNNNNN 

3 i CTG CTG 

Cjel: 5 ' CCA GTNNNNNN 

35 3' GGT CA 
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Termination linkers: 

The adapters made with the basic oligonucleotides 
described earlier can be used as termination linkers. 
There is therefore no need for a separate set of 
5 termination linkers. 

Method : 

In this method a trimming reaction using BstllX that 
will begin on a 3 1 5 base overhang is shown. The target 
10 DNA is shown below in which the first overhang that will 
be generated is marked " *" . 

**** 

3 1 CACTT **** 

15 

The first Bst71I overhang in the target DNA will be 
located 5-8 bases downstream of the overhang CACTT-3 1 . X. 
must therefore be 3 (see the figure below) . The 
following strategy can then be applied: 

20 

One linker is prepared that can address the 3 1 GTGAA 
overhang by annealing 4-3' 6 bases (QQQQQQ) with 3-3 1 5 
bases (GTGAA) in one tube: 

25 - GTGAA -3' 

3'- QQQQQQ 

The 3 ' - GAGTGC overhang is then ligated with the X=3 
initiation linker and the GTGAA- 3 ' overhang is ligated 
30 with the CACTT-3' overhang on the target DNA molecule: 

5 1 - - G C AG C G AC C ATG AGT CCA-CTC - - GTGGATG PPPPPP 

3 ' - - CGTCGCTGGTACTCAGGT - GAG - - CACCTAC QQQQQQ 

35 GTGAA 

CACTT 
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RXAMPTjE 5 = REMOVAL OF INTERVENING SEQUENCES FROM 
CONSTRUCTS 

In some instances, constructs may be prepared which 
5 contain undesirable nucleic acid sequences between, e.g 
the insert sequence and the vector sequence. Strategies 
for removing the linker sequences should then be 
applied. Illustrated below are some possible strategies 
in which binding sites for restriction enzymes are 
10 provided in the adapter sequences. Cleavage with the 
restriction enzymes will then result in DNA ends that 
can be religated. The vector DNA is marked as . .VWVVW 
while insert DNA is marked as IIIIIII. 

15 Method 1 

Two IIS enzymes that generate 5 ' -4 base overhangs (BJbsI 
and Esp3I) : 

. . VWWWVGAGC-GAGACG GAAGAC- -GAGCI IIIIIIIII 

20 WWWWCTCG-CTCTGC CTTCTG- - CTCGI IIIIIIIII. . 

After cleavage with BJbsI and Esp31: 

. .VWVVVW + GAGC - GAGACG GAAGAC- - + 

2 5 WWWWCTCG - CTCTGC CTTCTG - - CTCG 

GAGCIIIIIIIIII 

IIIIIIIIII. . 

30 After ligation with T4 DNA ligase: 

GAGC - GAGACG GAAGAC - + 

- CTCTGC CTTCTG - CTCG 

35 . .WWWWGAGCIIIIIIIIII 

WWWWCTCGIIIIIIIIII. . 
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Method 2 

One IIS enzyme that generates two 3' 3 base 
overhangs (BsaXI ) : 

. . WWWWGAG AC CTCC GAGI 1 1 1 1 1 1 1 1 1 

WWWWCTC TG GAGG CTCII 1 1 II 1 1 II . . 

After cleavage with BsaXI : 

..WWWWGAG + AC CTCC GAG 

WWWW CTC-- -TG GAGG 

+ IIIIIIIIII 
CTCIIIIIIIIII 

After ligation with T4 DNA ligase: 

. AC -CTCC GAG + 

CTC TG GAGG 

. .WWWWGAGIIIIIIIIII 
WWWWCTCIIIIIIIIII . . 

Method 3 

One IIS enzyme that generates blunt ends (Mlyl): 

. .WWWW GAGTC IIIIIIIIII 

WWWW CTGAG --IIIIIIIIII. . 

After cleavage with Mlyl : 

. .VWWWV + „ GAGTC - + 

WWWW CTGAG 

IIIIIIIIII 
IIIIIIIIII . . 
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After ligation with T4 DNA ligase: 



PCT/GB00/02512 



GAGTC - 

CTGAG 

. VWWVW 1 1 1 1 1 1 1 1 1 1 
WWWWIIIIIIIIII . . 



RXAMPTiK 6 - IDENTIFYING OLIGONUCLEOTIDE SETS WITH 6 BASE 
10 PATR OVERHANGS WITH MINIMAL MIS -MATCH LIGATIONS 

In order to identify oligonucleotide sets with 6 base 
pair overhangs which are unlikely to form mis -match 
ligations with one another the following steps may be 
15 taken. 

1. Create all 2048 overhang pairs of 6 bases. 

2. Remove the 32 palindromic pairs. 

20 This produces a final set of 2016 overhang pairs. 
PART 1 

1. Take a pair as pair #1 and select the next pair by 
executing section 1. 

25 

Section 1 
Algorithm 1 

Compute the (2016 - n) tables of unweighted mismatch 
scores between the already chosen n pair(s) and all 

30 (2016 - n) remaining pairs, and find among the latter 

the pair(s) for which the lowest score in the table is 
the highest (see below for details about score 
computation) . If there is only one such pair, then 
select it. If there are several pairs, then compute the 

3 5 weighted mismatch scores of the overhang comparisons 
that gave the lowest unweighted score and find the 
pair(s) for which the lowest weighted score is the 
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highest. If there is only one such pair, then select 
it. If there are several pairs, then redo .the whole 
procedure using the second lowest unweighted score in 
the mismatch table, then the third lowest, and so on. 
5 If several pairs remain tied after all mismatch scores 
have been considered, keep them all. 

Repeat algorithm 1 for each selected pair and iterate it 
over the desired number of positions to obtain the 

10 chain (s) of overhang pairs. This procedure generates a 
tree with an overhang pair on each branch. The lowest 
unweighted and weighted mismatch scores of the 
particular combination of pairs at each point are 
computed/ A particular pathway is stopped (1) when the 

15 desired number of positions is reached, or (2) when the 
combination of pairs is one that has already been found 
earlier, or (3) when the lowest mismatch scores of that 
combination are lower than the lowest scores of the 
complete chain (s) already constructed. Point (3) ensures 

20 that each new complete chain always has lowest mismatch 
scores that are higher than or at least equal to those 
of the previously constructed chain (s). Note also that, 
as a result of this process, all pairs in a given chain 
are unique and all complete chains in the tree are. 

25 unique. The whole process terminates when the last 
pathway to be explored stops . Keep the complete 
chain (s) whose lowest mismatch scores are the highest. 



30 



Re p ea t section 1 starting with each of the 2016 pairs as 
pair #1 to produce a set of 2016 overhang chains. Find 
the best chain (s) by applying algorithm 2 



Algorithm 2 

For all chains, compute the tables of unweighted 
35 mismatch scores between all the pairs that are present 

in the chain, and find the chain (s) for which the lowest 
score in the table is the highest (see below for 
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details) . If there is only one such chain, then select 
it. If there are several chains, then compute the 
weighted mismatch scores of the overhang comparisons 
that gave the lowest unweighted score and find the 
5 chain (s) for which the lowest weighted score is the 

highest. If there is only one such chain, then select 
it. If there are several chains, then redo the whole 
procedure using the second lowest unweighted score in 
the mismatch table, then the third lowest, and so on. 
10 If several chains remain tied after all mismatch scores 
have been considered, then keep all of them. 

This allows the production of a set of one or more 
overhang chains . 

15 

PART 2 

Take a chain and execute section 2. 

Section 2 

20 Algorithm 3 

For that chain, find the overhang pair(s) that is (are) 
responsible for the lowest unweighted and weighted 
scores in the table of mismatch scores between all pairs 
in the chain. Then, create new chains by substituting 

25 that pair with all remaining overhang pairs that are not 
present in the original chain (if there are several 
pairs to be substituted, substitute one pair at a time) . 
From the complete set of newly generated chains and the 
original chain, select one or more chains following 

30 algorithm 2. Here, including the original chain into 

algorithm 2 ensures that the selected chains always have 
a mismatch score that is higher than or at least equal 
to the score of the original chain. The improvement <if 
any) may involve the lowest or nth lowest unweighted 

35 score, or the corresponding weighted score. 

Repeat algorithm 3 for each selected chain. This 
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procedure generates a tree with a chain on each branch. 
Each new chain which is added to the tree has a mismatch 
score higher than or equal to the score of the chain 
found in the previous step. A particular pathway is 
5 stopped when the selected chain is one that has already 
been found earlier. This ensures that all chains in the 
tree are unique. The whole process terminates when the 
last pathway to be explored stops. Keep all the chains 
that are present in the tree. 

10 

Repeat section 2 (i.e., construct a tree) starting with 
each of the chains selected at the end of part 1. 

From the whole set of chains present in all trees, 
15 select one or more chains following algorithm 2. 

This produces a final set of one or more overhang 
chains. 

20 PHMPTTTAT I ON OF MISMATCH SCORES 
TTnw^iahted score 

The unweighted score for a ligation between two 6 -base 
overhangs is the number of mismatches observed, 
25 considering the triplets of the first 3 and the last 3 
bases separately. For example, the score for the 
ligation AAAAAC / TTTGCA is 0-3 and the score for 
AAAAAC / TCAGGG is 2-2. All possible scores are ranked 
from highest to lowest according to the order below: 

30 

highest: : 3-3 

3-2/2-3 
2 - 2 

3-1/1-3 

35 2-1/1-2 

1-1 

3-0/0-3 
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2-0/0-2 

lowest : : 1-0/0-1 
Weighted score 

5 The weighted score (WS) for a ligation is computed as 
follows : 

6 

WS = 6-£ BPS : 
i = l 

10 where BPSi is the score for the particular base pair at 
site i and is given in the table below: 



AA = 


1 . 


0 


CA = 


0 


. 6 


GA = 


1 


. 0 


TA = 0 


- 0 


AC = 


0 . 


6 


CC = 


1 


. 0 


GC = 


0 


. 0 


TC = 0 


. 6 


AG = 


1 . 


0 


CG = 


0 


. 0 


GG = 


0 


. 9 


TG = 0 


.2 


AT = 


0 . 


0 


CT = 


0 


.6 


GT = 


0 


. 2 


TT = 0 


.6 



For the perfect match between an overhang and its 
complement , WS = 6 . 

20 

COMPARISON AMONG PAIRS AND CONSTRUCTION OF TABLES OF 
SCORES 

Finding the next overhang pair 

25 

To select the next overhang pair, tables of mismatch 
scores between the pairs selected at previous positions 
and all remaining pairs are computed. To construct such 
a table, all previously selected pairs are compared with 
3 0 the new pair and also every overhang is compared with 

itself. Thus, if n pairs have already been selected, the 
number of ligations considered for each table is 4n + 
2(n+l) = 6n+2. When comparing two overhangs that are on 
the same DNA strand, one of them is reversed. 

35 

Let us consider the following example where pairs 
AAAAAC / TTTTTG (1A/1B) and AAACGT / TTTGCA (2A/2B) have 
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been chosen previously and the new pair AGTCCC/TCAGGG 
(3A/3B) is tried at the next position: 

The corresponding table is: 

5 





Comparison 


Overhang 


Ligation 


Unweighted 
Score 


Weighted 
Score 


10 


1 VS. 1 


1A 
1A 


AAAAAC 
CAAAAA 


3^3 


0 . 8 






IB 
IB 


TTTTTG 
GTTTTT 


3-3 


3 . 2 




2 vs 2 


2A 
2A 


AAACGT 
TGCAAA 


2-2 


2 . 8 






2B 
2B 


TTTGCA 
ACGTTT 


2-2 


4 . 4 


15 


3 vs 3 


3A 
3A 


AGTCCC 
CCCTGA 


2-2 


3.6 






' 3B ' 
3B 


TCAGGG 
GGGACT 


2-2 


3 . 6 




1 vs 3 


1A 
3A 


AAAAAC 
CCCTGA 


3-2 


2 . 6 


20 




3B 


TV TV TV TV TV 

AAAAAC 
TCAGGG 


z - Z 








IB 
3A 


TTTTTG 
AGTCCC 


2-2 


4.0 ! 






IB 
3B 


TTTTTG 
GGGACT 


3-2 


4 . 6 




2 vs 3 


2A 
3A 


AAACGT 
CCCTGA 


3-2 


2 . 7 


25 




2A 
3B 


AAACGT 
TCAGGG 


2-2 


3 . 3 






2B 
3A 


TTTGCA 
AGTCCC 


2-2 


3 . 6 
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2B 


TTTGCA 


3-2 


3 .4 




3B 


GGGACT 







Here, the lowest score is 2-2; 2.4 given by the ligation 
5 between overhangs 1A and 3B. 

Score table for a chain 

To compute the table of mismatch scores for a chain, all 
10 overhang pairs contained in the chain are compared with 
each other and also every overhang is compared with 
itself. Thus, for a chain of p overhang pairs, the 
number of ligations considered is 4p(p-l)/2 + 2p = 
2(p2) . As above, one of the two overhangs is reversed 
15 in the comparison when both are on the same DNA strand. 

For example, let us consider the following 3-pair (i.e., 
4-position) chain: AAAAAC / TTTTTG (1A/1B), AAACGT / TTTGCA 
(2A/2B) , AGTCCC/TCAGGG (3A/3B) in which 1A is on one 
20 fragment, IB and 2A are on a second fragment, 2B and 3A 
are on a third fragment and 3B is on a fourth fragment . 



The corresponding table is: 



Comparison 


Overhang 


Ligation 


Unweighted 


Weighted 








Score 


Score 


1 vs 1 


1A 


AAAAAC 


3-3 


0 . 8 




1A 


CAAAAA 








IB 


TTTTTG 


3-3 


3.2 




IB 


GTTTTT 






2 vs 2 


2A 


AAACGT 


2-2 


2.8 




2A 


TGCAAA 








2B 


TTTGCA 


2-2 


4 . 4 




2B 


ACGTTT 
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3 vs 3 


3A 
3A 


AGTCCC 
CCCTGA 


2-2 


3 .6 






3B 
3B 


TCAGGG 
GGGACT 


2-2 


3.6 


5 


T vq 9 

_L V O 


1A 
2A 


AAAAAC 
TGCAAA 


2-3 


1 . 8 






1A 
2B 


AAAAAC 
TTTGCA 


0-3 


3.8 






IB 
2A 


TTTTTG 
AAACGT 


0-3 


5.0 






IB 
2B 


TTTTTG 
ACGTTT 


2-3 


3 . 8 


10 




1A 
3A 


AAAAAC 
CCCTGA 


3-2 


2.6 






1A 
3B 


AAAAAC 
TCAGGG . 


2-2 


2 .4 






IB 
3A 


TTTTTG 
AGTCCC 


2-2 


4.0 






X 13- ■ 

3B 


TTTTTG 
GGGACT 


3-2 


4 . 6 


15 




2A 
3A 


AAACGT 
CCCTGA 


3-2 


2 . 7 






2A 
3B 


AAACGT 
TCAGGG 


2-2 


3..3 






2B 
3A 


TTTGCA 
AGTCCC 


2-2 


3.6 






2B 
3B 


TTTGCA 
GGGACT 


3-2 


3.4 



20 Here, the lowest score is 0-3; 3.8 given by the ligation 
between overhangs 1A and 2B. 
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Results obtained: 
Table of breaking points 
5 PART 1 



25 



Tr l~> i- 


Tlnup i rr V~i f 
uiiwcxyiiLcu 




tt or equal 


posi t ions 


score 


score 


chains 


3 


3-3 


1.6 


48 


4 


2-2 


4 . 0 


48 


9 


2-2 


2 . 5 


12 


10 


3-1 


3 . 2 


12 


14 


3-1 


2 . 4 


6 


15 


2-1 


4 . 6 


6 


33 


2-1 


3.0 


12 


34 


3-0 


4.6 


12 


90 


3-0 


3 .1 




PART 2 


# of 


Unweighted 


Weighted 


# of equal 


positions 


score 


score 


chains 


3 


3-3 


1.6 


48 


4 


3-2 


2.2 


48 


9 


2-2 


2.5 


12 


10 


3-1 


3 .2 


12 


14 


3-1 


2.4 


6 


15 


3-1 


2 . 0 


6 


33 


2-1 


3 . 0 


12 


34 


3-0 


4 . 6 


12 


90 









30 

It will be noted that the unweighted mis-match score (in 
which (9 = 3-3, 8 = 3-2, 7 = 2-2, 6 = 3-1, 5 = 2-1, 4 = 
1-1, 3 = 3-0, 2 = 2-0, 1 = 1-0) reduces as the number of 
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positions increases . 

'sam ples of rhains obtained at the end of part 1 and at 
t-hg end of part 2 

5 

3 positions (this chain is obtained at the end of both 
parts) : 

AACTCG/TTGAGC 
TCTCAC/ AGAGTG 

10 

4 positions: 
part 1 

AATTGG / TT AAC C 
TG C C AC / ACGGTG 
15 ATAGTC / TATCAG 
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part 2 

AATGGG/TTACCC 
TCGGAC/AGCCTG 
TTAACG / AATTG C 

5 

9 positions (this chain is obtained at the end of both 
parts) : 

AATCAC / TTAGTG TACACG / ATGTGC AGGCTG / TCCGAC 

TGAGGG / ACT CC C ACATT C / TGTAAG TTTAGC / AAATCG 

10 TCGGAT / AGCCTA GGCTAG/CCGATC 



10 positions (this chain is obtained at the end of both 
parts) : 

AAAACC/TTTTGG AGGCTC/TCCGAG TCGATA/ AGCTAT 

TTGGGG / AACCCC GTCATG / CAGTAC ATT CAG / TAAGTC 

TCATAG / AGTATC TGCAGT/ ACGTCA AGAGAT/TCTCTA 



14 positions (this 
parts) : 
ACGTGC / TGCACG 
TATGAG / ATACTC 
TGCACG /ACGTGC 
AT ACAC / TATGTG 
AACTTG / TTGAAC 



chain is obtained at 

GTTGGC / CAACCG 
TTGCGG / AACGCC 
AGTATC / TCATAG 
TGACTA/ ACTGAT 
ACTCCG / TGAGGC 



the end of both 

TCAGCC/ AGTCGG 
AGAGGG/TCTCCC 
CACCGC/ GTGGCG 



25 15 positions: 
part 1 

AAAACC /TTTTGG 
TTGGGG/ AACCCC 
T CAT AG / AGT AT C 
3 0 AGGCTC/TCCGAG 
GTCATG /CAGTAC 



TGCAGT/ ACGTCA 
T CG ATA / AG CT AT 
ATTCAG/ TAAGTC 
AG AGAT / TCTCTA 
TACTTC/ATGAAG 



AAGTAA/ TTCATT 
CCGTCC/ GGCAGG 
TGTAAC / ACATTG 
ACCGTG / TGGCAC 
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part 2 

AAAACC / TTTTGG 
TTGGGG/ AACCCC 
T CAT AG / AGT ATC 
AGGCTC/TCCGAG 
GACAAG/ CTGTTC 



- 69 - 

TCTGCT/ AGACGA 
TCGATA / AG CTAT 
ATT C AG / TAAGTC 
AGAGAT/TCTCTA 
TACTTC / ATGAAG 



AAGTAA/ TTCATT 
CCGTCC/GGCAGG 
TGT AAC / AC ATTG 
ACCGTG / TGGCAC 



33 positions (this chain is obtained at the end of both 
parts) : 

AACTAG / TTGATC ' GTAAGG/ CATTCC TCGCCT/ AGCGGA 

TGGAGC / ACCTCG AAACTA/ TTTGAT TCTCGG/ AGAGCC 

TCAAAT / AGTTTA GTCTCC/ CAGAGG ACCCCC / TGGGGG 

CAGGCC /GTCCGG ACAGCG/TGTCGC TTTTCG/AAAAGC 

TATCAC / ATAGTG CACATC/ GTGTAG AAGTCA/ TTCAGT 

AGATTC / TCTAAG TGTGTA/ ACACAT GTTCTC / CAAGAG 

TTCCGT/AAGGCA TAATGC/ ATTACG 

CCCACG /GGGTGC GGTAAG/ CCATTC 

ATGCCG / TACGGC AGTTAT / TCAATA 

TCCGTC/AGGCAG C AAC AG/ GTTGTC 

CCACGC / GGTGCG . ATCGGC/TAGCCG .. ■ 

ACTATG / TGATAC AATGCT / TTACGA 

TTAGCA/AATCGT TTGGAG/ AACCTC 



34 positions (this chain is obtained at the end of both 
parts) : 

AACTCT/TTGAGA TTATTC/ AATAAG CCAATC/GGTTAG 

TCG AAC / AGCTTG CACAAG / GTGTT C ACTTAT / TGAATA 

CAGGGC / GTCCCG TCCGAT / AGGCTA AAAGAG / TTTCTC 

T AAAGG / ATTTCC AGTAGC/TCATCG TTGATA/ AACTAT 

TGTGCG/ACACGC . CCGTCG/ GGCAGC AAGACC / TTCTGG 

ATGTAG / TACATC TCACTA/AGTGAT CAATCC/ GTTAGG 

TTCCCC /AAGGGG GTG ACG / CACTGC TCTCGC/ AGAGCG 

AATCTC / TTAGAG TGAAAT/ ACTTTA AGGGGG/TCCCCC 

TGGCGT / ACCGCA AGCATG / TCGTAC TGCCAG/ ACGGTC 

GGCTGC/ CCGACG ACCGTC / TGGCAG TACTAC / ATGATG 
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TTTGAC / AAACTG 
AC ACCG / TGTGG C 
TGAGGC / ACTCCG 



90 positions (this 
1) : 

AAAAAA / TTTTTT 
CCGGCC/GGCCGG 

10 AGGTAG/TCCATC ' 

TCCATC/AGGTAG 
AT CTG C / T AG ACG 
TAGACG / ATCTGC 
ACTGTG / TGACAC 

15 TGACAC /ACTGTG 

CATTAC / GTAATG 
ACCCCA/ TGGGGT 
ATGGTA/ TACCAT 
CGAAGC/ GCTTCG 

2 0 ATTACC / TAATGG 

TAATGG / ATT AC C 
CT CCTC / GAGGAG 
AGTTGA/ TCAACT 
AATGCT / TTACGA 
25 TTACGA/ AATGCT 

AAGCGC / TTCGCG 
TTCGCG/AAGCGC 
CCCAAG / GGGTTC 
GGGTTC / CCCAAG 

3 0 ACATCC/TGTAGG 

TGTAGG / ACATC C 
AACTTG / TTGAAC 
TTGAAC /AACTTG 
ATAGAC / TATCTG 
3 5 TATCTG / ATAGAC 

AGACCG/TCTGGC 



chain is obtained at the end of part 



TCTGGC/AGACCG 
ACGCAG / TGCGTC 
TGCGTC/ACGCAG 
AGTCAT / TCAGTA 
TCAGTA/AGTCAT 
CAGCCG/GTCGGC 
GTCGGC/ CAGCCG 
AATTTC/TTAAAG 
TTAAAG / AATTTC 
CCAACG/ GGTTGC 
GGTTGC/ CCAACG 
C AC CAC / GTGGTG 
AGAATA/ TCTTAT 
TCTTAT / AGAATA 
ATCAAT / TAGTTA 
TAGTTA/ ATCAAT 
ACTTCA/TGAAGT 
AGCCCC/TCGGGG 
TCGGGG/ AGCCCC 
ACCATG/ TGGTAC 
TGGTAC /ACCATG 
AGGGGA/ TCCCCT 
CTAAT C / G ATTAG 
CGAGAG/ GCTCTC 
GCTCTC/ CGAGAG 
ACACGT / TGTGCA 
TGTGCA/ ACACGT 
CCTGTC/GGACAG 
GGACAG/ CCTGTC 



AAACGG / TTTGCC 
TTTGCC / AAACGG 
AACCAA/ TTGGTT 
CAAAAC/ GTTTTG 
AAGGAA/ TTCCTT 
CGCCGC/GCGGCG 
AGTGCG/TCACGC 
TCACGC / AGTGCG 
ATTTTA/ TAAAAT 
ATCCTA/ TAGGAT 
AGTATC / TCATAG 
TCATAG/ AGTATC 
ATGTGG / TACACC 
TACACC/ ATGTGG 
ATGCAC / TACGTG 
TACGTG / ATGCAC 
ACTAAC / TGATTG 
TGATTG / ACTAAC 
CAGTGC/ GTCACG 
GTCACG/CAGTGC 
AATAAG / TTATTC 
TTATTC / AATAAG 
AGATAT / TCTATA 
TCT ATA / AGATAT 
AAGTCG / TTCAGC 
TTCAGC /AAGTCG 
AATCG A / TT AG CT 
TTAGCT / AATCGA 
AGGCTC/TCCGAG 
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CGGGGC/GCCCCG 



5 ^XAMPT.E 7 - CONSTRUCTION O F A 5 - FRAGMENT CHAIN ENCODING 

THE BINARY SEQUENC E 1-0-1-0-0 

This experiment demonstrates the construction of a 
specific 5 fragment chain using a set of four 
10 non-pal indromic 5* 6 base overhang pairs. The set of 
four unique overhang pairs was found using a computer 
program as described in Example 6 . 

Based upon the overhang pairs, a set of five library 
15 components was made by annealing complementary 
oligonucleotides in separate tubes: 
signal 1 : 

5 ' - TAATACGACTCACTATACCACAAGTTTGTACAAAAAAGCAGGCTCTATTC - 3 ' 
and 5* -TAGGAAGAATAGAGCCTGCTTTTTTGTACAAACTTGTGGTATAGTGA 
2 0 GTCGTATTA-3 1 ;' 

signal 2 : 

5 1 - TTCCTATGCAGTGGACCACTTTGTACAAGAAAGCTGGGTTGCAGT - 3 ' and 
5 » -GCAACTACTGCAACCCAGCTTTCTTGTACAAAGTGGTCCACTGCA- 3 ' ; 

signal 3 : 

2 5 5 1 - AGTTGCTTGACGCCACAAGTTTGTACAAAAAAGCAGGCTTTGACG - 3 1 and 

5 ' - CGACATCGTCAAAGCCTGCTTTTTTGTACAAACTTGTGGCGTCAA- 3 ' ; 

signal 4 : 

5 ' - ATGTCGAAGGGCGGACCACTTTGTACAAGAAAGCTGGGTAAGGGC - 3 ' and 
5 " - GACAGGGCCCTTACCCAGCTTTCTTGTACAAAGTGGTCCGCCCTT - 3 ' ; 

3 0 signal 5 : 

5 ■ -CCTGTCATGTGGACCACTTTGTACAAGAAAGCTGGGTTTCTATAGTGTCACCT 

AAATC-3 ' and 5 ' -GATTTAGGTGACACTATAGAAACCCAGCTTTCTTGTACAA 
AGTGGTCCACAT - 3 1 ; 

T7 : 5 1 - TAATACG ACTCACTATACCA - 3 1 
35 T7-Cy5 primer: 5 ' -TAATACGACTCACTATA- 3 ' 

SP6 primer : 3 1 - AAGATATCACAGTGGATTTAG- 5 1 
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The library components (4 pmol each) were then mixed 
together and ligated using 100 U T4 DNA ligase (NEB) in 
IX ligase buffer at 25°C for 15 minutes. The ligase was 
then inactivated at 65°C for 20 min. 

5 

Sfxl of the ligation reaction (50/il) was used as template 
in a PGR reaction (50/xl) containing IX Thermopol buffer 
(NEB), 0.0 5 mM dNTPs , 0.4 /iM T7 primer, 0.4 /iM SPG 
primer and 0.04 U//xl Vent polymerase (NEB) . The PCR was 

10 hot started (95°C for 3 minutes before addition of 

polymerase) and cycled 30 times; 95°C, 30 sec; 55°C, 30 
sec; 76°C, 30 sec, using a PTC-200 thermo cycler (MJ 
Research). 10 /xl of the PCR was analysed on a 1.5% 
agarose gel as shown in Figure 5. The gel picture showed 

15 only one intense band corresponding to approximately 240 
bp as expected (24 3 bp) . The remaining PCR product was 
extracted twice with chloroform and precipitated using 
71% ethanol and 0 . 1M NaAc . The DNA was dissolved in 
water and sequenced. The sequence confirmed that the 

20 expected signal chain (1-0-1-0-0) was generated. 

EXAMPLE 8 - CONSTRUCTION QF A 5X5 FRAGMENT CHAI N 
ENCODING THE BINARY SEQUENCE USING ONE LIGATION CYCLE 
F OLLOWED BY ONE PCT CY C LE OR BY TWQ LIG AT ION CY C LES 

25 

This experiment demonstrates the use of complementary 
primer pairs to link fragment chains together as an 
alternative to the ligation strategy demonstrated in the 
previous example . 

30 

In this experiment 5 fragments chains with 5 positions 
(fragments or bits) each are ligated separately in 
ligation cycle 1 as demonstrated earlier (Example 7) . 
The 5 fragment chains are then amplified with 5 
35 different primer pairs (pair 1 is used to amplify chain 
1, pair 2 is used to amplify chain 2, etc) . The second 
primer in primer pair 1 is complementary to the first 
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primer in prime pair 2, the second primer in primer pair 
2 is complementary to the first primer in primer pair 3, 
and so on. 

5 A small aliquot is then taken from each of the 5 PCR 
reactions and a new PCR reactions is performed with 
primers that are specific to the end of signal chain 1 
and 5 . The method is illustrated in Figure 6 . 

10 Materials : 

Oligonucleotides are selected which bind to the fragment 
chain and also serve as primers. Thus for example, for 
adjacent chains may be bound using for example the 
15 following primer pairs: 

fragment chain 2 terminal (with bound primer) : 
TTCTATAGTGTCACCTAAATC 

AAGATATCACAGTGGATTTAGCCTACCAGTACATCCAACGGCAACT 

20 

fragment chain 3 terminal (with bound primer) : 
GTCA TG TA GG TTGCCGTTGA TCCA TCCTAA TA CGA CTCA CTA TAGCA 

ATTATGCTGAGTGATATCGT 

25 The above exemplified primer regions are complementary 
and may thus be bound together. 

As an alternative to this method, two ligation cycles 
may be used in which 5 fragment chains (generated by 

30 ligation) , are ligated together. Thus, several 

construction cycles to build up long signal chains. 
After the initial ligation in the first ligation cycle 
the 5 fragment chains are then amplified with primers • 
containing a Fokl site. The primers are appropriately 

3 5 selected such that digestion with Fokl will then make 
non-palindromic overhangs in the end of each fragment 
chain in which the overhang generated in fragment chain 
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1 is able to ligate with the first overhang generated in 
fragment chain 2, the second overhang generated in 
fragment chain 2 is able to ligate with the first 
overhang generated in fragment chain 3, and so on. The 5 
5 fragment chains can thereby be ligated together in a 
controlled manner to generate a final chain with 25 
fragments (bits) . 

If we want to construct fragment chains with 100 or 500 
10 fragments we can repeat this procedure 1 or 2 more 
times. The polymerase capacity will, however, be a 
limiting factor regarding how many ligation cycles it is 
possible to perform. Other strategies will therefore 
need to be employed to construct even longer chains. 

15 

EXAMPLE 9: CLONING OF AN INSERT FROM PHIX174 INTO PUC1 
WITH A TRIMMED GENE A 

This experiment demonstrates the "trimming" strategy for 
20 elimination of unwanted flanking sequences. Another 
important aspect of this experiment is that we 
demonstrate that it is possible to link a 5 * and 3 1 
overhang together with a single stranded oligonucleotide 
alone. It should also be noted that the inserts are 
25 cloned into two different IIS sites, thereby eliminating 
the problem with insert concatemerisation . 

In this method, Gene A from PhiX174 is cloned into a 
pUC-19 vector. PhiX174 is prepared by cleavage with 

3 0 Bbvl, resulting in 15 fragments flanked by different 
non- palindromic 5 ! 4 bases overhangs, as described in 
more detail in Example 1. The two overhangs adjacent to 
Gene A is then addressed with "initiation linkers" 
containing a BplI site, while the rest of the fragments 

35 is allowed to religate. T4 DNA ligase, BplI, a 

"propagation linker" containing a BplI site, and two 
"termination adaptors" addressed to the first and last 
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five bases of Gene A respectively are used. The 
solution is incubated at 3 7°C thereby allowing the 
trimming reaction to succeed until terminated when the 
five first and last bases in Gene A are reached. 

The pUC-19 vector is prepared by cleavage with Hgal and 
Bsal . The overhang generated by Hgal cleavage are 
described in Example 1. Cleavage with Bsal results in 4 
non- identical cleavages giving rise to 8 non- identical 
overhangs, e.g. site 1- GCCA/CGGT (1600) 

Gene A has the following sequence at its first and last 
five bases (marked by underlining) . 

. . . GCTGGAGGCCTCCACTAIG&&ATCGCGTAGAG . . . 
. . . CGACCTCCGGAGGTGATACTTTAGCGCATC 

CTGG CGG A AAATG A G AAAATT CG AC CT A . . . 

. . . ACGACCGCCTTTTACTCTTTTAAGCTGG 

When terminating the trimming procedure at the 
underlined sequences it is possible to clone Gene A 
without any unwanted flanking base pairs. The 3' 5 base 
overhangs generated by Bp! I correspond to the marked 
base pairs . 

The overhang pair generated by Hgal and Bsal in pUC19 
that is used as a cloning site for the gene A from 
PhiX174 is TTCTC/CGGT. 

MethQd; 

This is as described in Example 1 except that PUC19 is 
cut with both Hgal (NEB 4, 37°C) and thereafter with 
Bsal (NEB 4, 50°C) 
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Materials .! 

Initiation linker 1 (s) : 

5 ' ATT CGG TCG AGA TGC TCT CA3 ' 

5 

Initiator linker 1 (as) : 

5 ' CGA CTG AGA GCA TCT CGA CCG AAT3 1 

Initiation linker 2 (s) : 
10 5'GCG TTA CTG AGC GTA GCT CTG3 ' 

Inititator linker 2 (as) : 

5 1 CTC TCA GAG CTA CGC TCA GTA ACG C3 ' 

15 Propagation linker (s) : 

5 » TGC TGC AGG AGC GAA TCT CNN NNN3 ' 

Propagation linker (as) : 

5 » GAG ATT CGC TCC TGC AGC A3 " 

20 

Labeling linker 2 (s) 

5 ■ CTC TTG CTA TAG TGA GTC GTA TTA3 1 

Labeling linker 2 (as) : 
25 5 1 T AA TAC GAC TCA CTA TAG CA3 1 

Termination linker 1 (s) : 

5 1 AAG AGC TCA GGT CAT TGA CGT AGC TAT GAA3 1 

30 Termination linker 1/2 (as) : 

5 1 AGC TAC GTC AAT GAC CTG AG 3 ' 

Termination linker 1 (short version) : 
5 « AAG AGA TGA A3 ' 



35 



Termination linker 2 (s) : 

5 » ACC GCT CAG GTC ATT GAC GTA GCT TCA TT3 1 



BNSDOCID: <WO. 



_0100816A1_I_> 



WO 01/00816 PCT/GB00/02512 

- 77 - 

Termination linker 2 (short version) : 
5 1 ACC GTC ATT 3 ' 



The efficiency of the trimming reaction may be accessed 
5 as follows. Overhang 6) is addressed with a y- 22 P 

labelled adaptor. The trimming reaction is then allowed 
to start from overhang 1) . Aliquots are taken out at 
regularly time intervals and the size distribution of 
the DNA fragments is then analysed on gel. 
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Claims : 

1. A method of attaching a fragment of a first nucleic 
acid molecule to a second nucleic acid molecule, wherein 
5 said method comprises at least the steps: 

1) cleaving said first nucleic acid molecule with a 
nuclease which has a cleavage site separate from its 
recognition site to create at least one fragment of said 
first nucleic acid molecule having a single stranded 

10 nucleotide region (SSla) at at least one terminus of 
said fragment, 

2) if necessary generating a single stranded 
nucleotide region (SS2) at at least one terminus of said 
second nucleic acid molecule, 

15 3) binding to at least one single stranded region of 
step 1) (SSla) an adapter molecule comprising at one 
terminus a single stranded region (SSA1) complementary 
to the single stranded region of said first nucleic acid 
molecule fragment (SSla) and additionally comprising at 

2 0 the other terminus a further single stranded region 

(SSA2) complementary to the single stranded region (SS2) 
at one terminus of said second nucleic acid molecule, 
4) ligating said adapter to said first nucleic acid 
fragment , 

25 5) binding said adapter to said second nucleic acid 
molecule, and 

6) ligating said adapter to said second nucleic acid 
molecule . 

30 2. A method as claimed in claim 1 wherein said first 
nucleic acid molecule fragment has a single stranded 
nucleotide region at either terminus (SSla and SSlb) , 
each of which is bound by an adapter, which may be the 
same or different, and the first of said adapters is 

3 5 bound to said second nucleic acid molecule and the 

second of said adapters binds either to said second 
nucleic acid molecule or to a third nucleic acid 
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molecule. 

3. A method as claimed in claim 2, wherein said 
adapters bind to the termini of said second nucleic acid 

5 molecule, thereby forming a circular nucleic acid 
molecule . 

4. A method as claimed in any one of claims 1 to 3 , 
wherein said second nucleic acid molecule is a vector or 

10 a fragment thereof and single stranded regions are 

produced in step 2) by cleavage of said vector with a 
nuclease. 

5. A method as claimed in an one of claims 1 to 4, 

15 wherein said adapter molecule additionally comprises one 
or more nuclease recognition and cleave sites. 

6. A method as claimed in any one of claims 1 to 5 , 
wherein said nuclease is a restriction enzyme from the 

20 class of IP or IIS enzymes. 

7. A method as claimed in any one of claims 1 to 6, 
wherein two or more fragments of the first nucleic acid 
molecule are attached to different second and optionally . 

25 third nucleic acid molecules, or different termini 
thereof . 

8. A method as claimed in any one of claims 4 to 7, 
wherein one or more fragments of said first nucleic acid 

3 0 molecule are attached via adapters to single stranded 
regions in said second nucleic acid molecule resulting 
from different cleavage events. 

9. A method as claimed in claim 7 or 8 , wherein one or 
35 more fragments of said first nucleic acid molecule are 

attached via adapters, to single stranded regions in two 
or more second nucleic acid molecules. 
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10. A method as claimed in any one of claims 1 to 9, 
wherein 2 or more first nucleic acid molecules are 
cleaved and bound to one or more second nucleic acid 
molecules by adapter molecules simultaneously in the 

5 same reaction. 

11. A method as claimed in any one of claims 1 to 10, 
wherein all the steps are conducted together. 

10 12. A nucleic acid molecule produced according to a 
method as defined in any one of claims 1 to 11. 

13 . A cloning or expression vector containing the 
nucleic acid molecule as defined in claim 12 . 

15 

14. A eukaryotic or prokaryotic cell or transgenic 
organism containing a vector as defined in claim 13. 

15. A kit for attaching a first nucleic acid molecule 
20 fragment to a second nucleic acid molecule or a fragment 

thereof according to the method defined in any one of 
claims 1 to 11 comprising at least (i) one or more 
adapters as described in any one of claims 1 to 9, (ii) 
the second nucleic acid molecule and (iii) a nuclease 
25 which cleaves outside its recognition site, wherein the 
terminus of one of said adapters has a single stranded 
region complementary to a single stranded region 
generated on said second nucleic acid molecule after 
cleavage with said nuclease. 

30 

16. A method of synthesizing a double stranded nucleic 
acid molecule comprising at least the steps of: 

1) generating n double stranded nucleic acid 
fragments, wherein at least n-2 fragments have single 
3 5 stranded regions at both termini and 2 fragments have 
single stranded regions at at least one terminus, 
wherein (n-1) single stranded regions are complementary 
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to (n-1) other single stranded regions, thereby 
producing (n-1) complementary pairs, . 

2) contacting said n double stranded nucleic acid 
fragments, simultaneously or consecutively, to effect 

5 binding of said complementary pairs of single stranded 
regions , and 

3) optionally ligating said complementary pairs 
simultaneously or consecutively to produce a nucleic 
acid molecule consisting of n fragments. 

10 

17. A method as claimed in claim 16 wherein said 
fragments are each between 8 and 25 bases in length. 

18. A method as claimed in claim 16 or 17 wherein n is 
15 at least 10. 

19. A method as claimed in any one of claims 16 to 18 
wherein said fragment comprises a region representing a 
unit of information corresponding to one or more code 

20 elements 

20. A method as claimed in claim 19 wherein said code 
is alphanumeric. 

25 21. A method as claimed in claim 20 wherein said code 
is binary. 

22. A method as claimed in anyone of claims 19 to 21 
wherein each of said one or more code elements has the 
3 0 formula 

(X) a , 
wherein 

X is a nucleotide A, T, G, C or a derivative 
thereof which allows complementary binding and may be 
35 the same or different at each position, and 

a is an integer from 4 to 10, 
wherein (X) a is different for each one or more code 
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elements . 

23. A method as claimed in claim 22, wherein said code 
is binary and the code elements "1" and "0 M have the 
5 formulae: 

"0"= (X) a and "1"= (Y) b/ 
wherein 

(X) a and (Y) b are not identical, 
10 X and Y are each a nucleotide A, T, G, C or a 

derivative thereof which allows complementary binding 
and may be the same or different at each position, and 
a and b are integers from 4 to 10. 

15 24. A method as claimed in claim 23 wherein in the 
formulae (X) a and (Y) b/ X and Y are the same at each 
position. 

25. A method of synthesizing a double stranded nucleic 
20 acid molecule comprising at least the steps of: 

1) generating fragment chains according to the method 
defined in any one of claims 16 to 24; 

2) optionally generating single stranded regions at 
the end of said fragment chains, wherein said single 

25 stranded regions are complementary to other single 

stranded regions on said fragment chains thus forming 
complementary pairs of single stranded regions; 

3) contacting said fragment chains with one another, 
simultaneously or consecutively, to effect binding of 

3 0 said complementary pairs of single stranded regions. 

26 . A nucleic acid molecule produced according to a 
method as defined in any one of claims 16 to 25, or a 
single stranded nucleic acid molecule thereof . 

35 

27. A method of identifying the code elements contained 
in a nucleic acid molecule prepared according to a 
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method as defined in any one of claims 16 to 25, wherein 
a probe, carrying a signalling means, specific to one or 
more code elements, is bound to said nucleic acid 
molecule and a signal generated by said signalling means 
5 is detected, whereby said one or more code elements ,nay 
be identified. 

28. A library of fragments as defined in any one of 
claims 16 to 27, comprising (n) m fragments, wherein n is 

10 as defined in any one of claims 16 to 2.7 and corresponds 
to the length of chain that said library may produce, 
and m is an integer corresponding to the number of 
possible code elements or combinations thereof, such 
that fragments corresponding to all possible code 

15 elements for each position in the final chain are 
provided . 
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«0» starting fragments: « 1 » starting fragments: 

Position 1 GGGG GGGGAAA AAAAAAAAAA 

CCCCCCCCC 11111111 

Position 2 GGGG GGGGAAC AAAAAAAAAC 

TTTCCCCCCCCC 11111111111 



Position 7 
Position 8 



GGGG GGGGCCG 
GCGCCCCCCCCC 

GGGG GGGG 
GGCCCCCCCCCC 



FIG. 2 



AAAAAAACCG 
GCGTTTTTTTT 

AAAAAAA 
GGCTTTTTTTT 



Fragment 0 

Position 1 . 1 GGGG GGGGAAA 

CCCCCCCCC 

Position 1.2 AAAGGGG GGGGAAA 

CCCCCCCCC 

Position 1 .3 AACGGGG GGGGAAA 

CCCCCCCCC 



Fragment 1 

AAAAAAAAAA 
TTTTTTTT 

AAAAAAAAAAAAA 
TTTTTTTT 

AACAAAAAAAAAA 
TTTTTTTT 



Position 8.1 



Position 8.2 



Position 8.3 



GGGG GGGG 
GCCCCCCCCCCTTT 

GGGG GGGG 
GCCCCCCCCCCTTG 

GGGG GGGG 
GCCCCCCCCCCTTC 



AAAAAAA 
GC1 1 1 1 1 1 1 1 1 1 1 

AAAAAAA 
GCTTTTTTTTTTG 

AAAAAAA 
GC 1 1 1 1 1 1 1 1 1 1 C 



FIG. 3 
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The error appears in the priority claim which was made. Please note however that the date of 
27 June 1999 (the filing date given for application No. 19991325) has already been corrected 
under Rule 26bis. 1(a) and a copy of the communication from the International Bureau 
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the priority claim as it stands, even though those filing dates are erroneously matched with the 
wrong application number. As such, all the relevant details of the priority applications and their 
filing dates are provided in the original priority claim and the correct numbers could readily be 
married with the correct application numbers. It is therefore submitted that the priority claim 
comprises an obvious error which should be correctable. 

In this respect I enclose herewith a copy of the front page of the certification of Norwegian 
Patent Application Nos. 20003190 and 20003191 (which have the filing date of 28 June 1999) 
which describes the origin of these applications. Taking first Application No. 20003190, you 
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2000 but was accorded a filing date of 28 June 1999. The second document Application No. 
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Correction of the priority claim is therefore requested such that the following priority is claimed 
Norwegian Patent Application No. 20003190 filed on 28 June 1999 and 
Norwegian Patent Application No. 20003191 filed on 28 June 1999. 

In the event that correction of this obvious error is refused, I propose to request that the 
International Bureau publish this request for rectification under Rule 91.1(f) PCT in the 
publication of this application which is due to occur on 4 January 2001 . In view of the 
imminence of the completion of the technical preparations for publication which I am informed 
will be around 15 December 2000, your immediate attention to this matter would be 
appreciated. A copy of this letter has been forwarded to the International Bureau. 

I look forward to hearing from you regarding the above-mentioned rectification. 
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Frank B. Dehn & Co. 
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' IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 
ADED TEXT OR DRA WING 
BLURED OR ILLEGIBLE TEXT OR DRAWING 
"^SKEWED/SLANTED IMAGES 

□ COLORED OR BLACK AND WHITE PHOTOGRAPHS 

□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REPERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: ' * " 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning documents will not correct images 
problems checked, please do not report the 
problems to the IFW Image Problem Mailbox 
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